Most active commenters
  • stavros(8)
  • TZubiri(5)

←back to thread

492 points vladyslavfox | 16 comments | | HN request time: 2.282s | source | bottom
Show context
trompetenaccoun ◴[] No.41895988[source]
We need archives built on decentralized storage. Don't get me wrong, I really like and support the work Internet Archive is doing, but preserving history is too important to entrust it solely to singular entities, which means singular points of failure.
replies(19): >>41896170 #>>41896389 #>>41896411 #>>41896420 #>>41897459 #>>41897680 #>>41897913 #>>41898320 #>>41898841 #>>41899160 #>>41899729 #>>41899779 #>>41899999 #>>41900368 #>>41901199 #>>41902340 #>>41904676 #>>41905019 #>>41907926 #
1. stavros ◴[] No.41900368[source]
I designed a system where you could say "donate this spare 2 TB of my disk space to the Internet Archive" and the IA would push 2 TB of data to you. This system also has the property that it can be reconstructed if the IA (or whatever provider) goes away.

Unfortunately, when I talked to a few archival teams (including the IA) about whether they'd be interested in using it, I either got no response or a negative one.

replies(4): >>41900972 #>>41902214 #>>41902442 #>>41904379 #
2. 4gotunameagain ◴[] No.41900972[source]
Why reinvent the wheel ?

There are so many proven distributed archiving systems, a lot of which are mentioned in these comments.

replies(1): >>41902360 #
3. whywhywhywhy ◴[] No.41902214[source]
Because the incentive will be archiving things they believe should be archived, you need the process to begin with what urls do you want to be archiving, then people will be incentivized for archiving the juicy stuff IA is used for and you just throw some stuff they didn't ask to archive in the remit of them storing what they want.
replies(1): >>41903395 #
4. stavros ◴[] No.41902360[source]
What system of these allows me to donate a bunch of disk space to a provider of my choosing, without thinking about it afterwards?
5. boramalper ◴[] No.41902442[source]
Is this open source or do you have any design docs? I love the idea and would love to learn more about it.
replies(1): >>41902462 #
6. stavros ◴[] No.41902462[source]
The idea is that it'll be open source, I have a rough design doc here:

https://docs.google.com/document/d/1qKgIjUTef-I-BLWjn4sEIbYo...

I'll write up a more detailed article on it, though, it'll be good to at least have the doc public somewhere.

7. stavros ◴[] No.41903395[source]
I don't think that's necessarily true, I have a spare TB that I'd be glad to donate to the IA to store whatever they want in.
8. TZubiri ◴[] No.41904379[source]
Hosting something at a volunteer's drive, without any guarantees, is pretty useless. They can cease hosting, or have disk damage, and you lose data
replies(1): >>41904787 #
9. stavros ◴[] No.41904787[source]
You're essentially saying that having an extra copy of data is equally as reliable as not having an extra copy of data. I would encourage you to think about this a bit more.
replies(1): >>41905250 #
10. TZubiri ◴[] No.41905250{3}[source]
For these parameters, yes.

If you have a raid, then you have 2 copies with like 99.99% availability and 5 mean time years to failure.

With a volunteer drive you have like ?% availability and ?% years to failure? You can't depend on it.

Also the average value of data is very low, you don't want to be making many copies of for no reason.

replies(1): >>41905351 #
11. stavros ◴[] No.41905351{4}[source]
That would mean that even with a million volunteer drives storing a file, you still wouldn't be able to depend on them, which is plainly wrong.

> Also the average value of data is very low, you don't want to be making many copies of for no reason.

The reason is that the value of that data is high to the archivist, since they want to preserve it.

replies(1): >>41906362 #
12. TZubiri ◴[] No.41906362{5}[source]
A million is out of the parameters of the case.

Realistically you won't get enough volunteer-storage to cover one IA. And even if you did, it wouldn't satisfy the mission requirements, which is to store reliably for decades all of the data.

replies(1): >>41906413 #
13. stavros ◴[] No.41906413{6}[source]
This isn't meant to be storage for IA, it's meant to be a distributed backup.
replies(1): >>41907552 #
14. TZubiri ◴[] No.41907552{7}[source]
Ah my bad, so it's not a replacement of IA. In that case it makes sense
replies(1): >>41908483 #
15. stavros ◴[] No.41908483{8}[source]
Yes, the idea is that this is a replacement for the torrents they make public. In case the IA goes away, we'll have this distributed dataset to fall back on.
replies(1): >>41909221 #
16. TZubiri ◴[] No.41909221{9}[source]
An archive of an archive