←back to thread

492 points vladyslavfox | 4 comments | | HN request time: 0s | source
Show context
trompetenaccoun ◴[] No.41895988[source]
We need archives built on decentralized storage. Don't get me wrong, I really like and support the work Internet Archive is doing, but preserving history is too important to entrust it solely to singular entities, which means singular points of failure.
replies(19): >>41896170 #>>41896389 #>>41896411 #>>41896420 #>>41897459 #>>41897680 #>>41897913 #>>41898320 #>>41898841 #>>41899160 #>>41899729 #>>41899779 #>>41899999 #>>41900368 #>>41901199 #>>41902340 #>>41904676 #>>41905019 #>>41907926 #
stavros ◴[] No.41900368[source]
I designed a system where you could say "donate this spare 2 TB of my disk space to the Internet Archive" and the IA would push 2 TB of data to you. This system also has the property that it can be reconstructed if the IA (or whatever provider) goes away.

Unfortunately, when I talked to a few archival teams (including the IA) about whether they'd be interested in using it, I either got no response or a negative one.

replies(4): >>41900972 #>>41902214 #>>41902442 #>>41904379 #
TZubiri ◴[] No.41904379[source]
Hosting something at a volunteer's drive, without any guarantees, is pretty useless. They can cease hosting, or have disk damage, and you lose data
replies(1): >>41904787 #
stavros ◴[] No.41904787[source]
You're essentially saying that having an extra copy of data is equally as reliable as not having an extra copy of data. I would encourage you to think about this a bit more.
replies(1): >>41905250 #
TZubiri ◴[] No.41905250[source]
For these parameters, yes.

If you have a raid, then you have 2 copies with like 99.99% availability and 5 mean time years to failure.

With a volunteer drive you have like ?% availability and ?% years to failure? You can't depend on it.

Also the average value of data is very low, you don't want to be making many copies of for no reason.

replies(1): >>41905351 #
stavros ◴[] No.41905351[source]
That would mean that even with a million volunteer drives storing a file, you still wouldn't be able to depend on them, which is plainly wrong.

> Also the average value of data is very low, you don't want to be making many copies of for no reason.

The reason is that the value of that data is high to the archivist, since they want to preserve it.

replies(1): >>41906362 #
TZubiri ◴[] No.41906362[source]
A million is out of the parameters of the case.

Realistically you won't get enough volunteer-storage to cover one IA. And even if you did, it wouldn't satisfy the mission requirements, which is to store reliably for decades all of the data.

replies(1): >>41906413 #
1. stavros ◴[] No.41906413[source]
This isn't meant to be storage for IA, it's meant to be a distributed backup.
replies(1): >>41907552 #
2. TZubiri ◴[] No.41907552[source]
Ah my bad, so it's not a replacement of IA. In that case it makes sense
replies(1): >>41908483 #
3. stavros ◴[] No.41908483[source]
Yes, the idea is that this is a replacement for the torrents they make public. In case the IA goes away, we'll have this distributed dataset to fall back on.
replies(1): >>41909221 #
4. TZubiri ◴[] No.41909221{3}[source]
An archive of an archive