Most active commenters

stavros(8)
TZubiri(5)

Internet Archive breached again through stolen access tokens

(www.bleepingcomputer.com)

Show context

trompetenaccoun ◴[20 Oct 24 15:33 UTC] No.41895988[source]▶

We need archives built on decentralized storage. Don't get me wrong, I really like and support the work Internet Archive is doing, but preserving history is too important to entrust it solely to singular entities, which means singular points of failure.

replies(19): >>41896170 #>>41896389 #>>41896411 #>>41896420 #>>41897459 #>>41897680 #>>41897913 #>>41898320 #>>41898841 #>>41899160 #>>41899729 #>>41899779 #>>41899999 #>>41900368 #>>41901199 #>>41902340 #>>41904676 #>>41905019 #>>41907926 #

1. stavros ◴[21 Oct 24 03:11 UTC] No.41900368[source]▶

>>41895988 #

I designed a system where you could say "donate this spare 2 TB of my disk space to the Internet Archive" and the IA would push 2 TB of data to you. This system also has the property that it can be reconstructed if the IA (or whatever provider) goes away.

Unfortunately, when I talked to a few archival teams (including the IA) about whether they'd be interested in using it, I either got no response or a negative one.

replies(4): >>41900972 #>>41902214 #>>41902442 #>>41904379 #

2. 4gotunameagain ◴[21 Oct 24 05:38 UTC] No.41900972[source]▶

>>41900368 (TP) #

Why reinvent the wheel ?

There are so many proven distributed archiving systems, a lot of which are mentioned in these comments.

replies(1): >>41902360 #

3. whywhywhywhy ◴[21 Oct 24 09:25 UTC] No.41902214[source]▶

>>41900368 (TP) #

Because the incentive will be archiving things they believe should be archived, you need the process to begin with what urls do you want to be archiving, then people will be incentivized for archiving the juicy stuff IA is used for and you just throw some stuff they didn't ask to archive in the remit of them storing what they want.

replies(1): >>41903395 #

4. stavros ◴[21 Oct 24 09:44 UTC] No.41902360[source]▶

>>41900972 #

What system of these allows me to donate a bunch of disk space to a provider of my choosing, without thinking about it afterwards?

5. boramalper ◴[21 Oct 24 09:57 UTC] No.41902442[source]▶

>>41900368 (TP) #

Is this open source or do you have any design docs? I love the idea and would love to learn more about it.

replies(1): >>41902462 #

6. stavros ◴[21 Oct 24 10:00 UTC] No.41902462[source]▶

>>41902442 #

The idea is that it'll be open source, I have a rough design doc here:

https://docs.google.com/document/d/1qKgIjUTef-I-BLWjn4sEIbYo...

I'll write up a more detailed article on it, though, it'll be good to at least have the doc public somewhere.

7. stavros ◴[21 Oct 24 12:24 UTC] No.41903395[source]▶

>>41902214 #

I don't think that's necessarily true, I have a spare TB that I'd be glad to donate to the IA to store whatever they want in.

8. TZubiri ◴[21 Oct 24 14:11 UTC] No.41904379[source]▶

>>41900368 (TP) #

Hosting something at a volunteer's drive, without any guarantees, is pretty useless. They can cease hosting, or have disk damage, and you lose data

replies(1): >>41904787 #

9. stavros ◴[21 Oct 24 14:48 UTC] No.41904787[source]▶

>>41904379 #

You're essentially saying that having an extra copy of data is equally as reliable as not having an extra copy of data. I would encourage you to think about this a bit more.

replies(1): >>41905250 #

10. TZubiri ◴[21 Oct 24 15:30 UTC] No.41905250{3}[source]▶

>>41904787 #

For these parameters, yes.

If you have a raid, then you have 2 copies with like 99.99% availability and 5 mean time years to failure.

With a volunteer drive you have like ?% availability and ?% years to failure? You can't depend on it.

Also the average value of data is very low, you don't want to be making many copies of for no reason.

replies(1): >>41905351 #

11. stavros ◴[21 Oct 24 15:41 UTC] No.41905351{4}[source]▶

>>41905250 #

That would mean that even with a million volunteer drives storing a file, you still wouldn't be able to depend on them, which is plainly wrong.

> Also the average value of data is very low, you don't want to be making many copies of for no reason.

The reason is that the value of that data is high to the archivist, since they want to preserve it.

replies(1): >>41906362 #

12. TZubiri ◴[21 Oct 24 17:34 UTC] No.41906362{5}[source]▶

>>41905351 #

A million is out of the parameters of the case.

Realistically you won't get enough volunteer-storage to cover one IA. And even if you did, it wouldn't satisfy the mission requirements, which is to store reliably for decades all of the data.

replies(1): >>41906413 #

13. stavros ◴[21 Oct 24 17:39 UTC] No.41906413{6}[source]▶

>>41906362 #

This isn't meant to be storage for IA, it's meant to be a distributed backup.

replies(1): >>41907552 #

14. TZubiri ◴[21 Oct 24 19:28 UTC] No.41907552{7}[source]▶

>>41906413 #

Ah my bad, so it's not a replacement of IA. In that case it makes sense

replies(1): >>41908483 #

15. stavros ◴[21 Oct 24 21:00 UTC] No.41908483{8}[source]▶

>>41907552 #

Yes, the idea is that this is a replacement for the torrents they make public. In case the IA goes away, we'll have this distributed dataset to fall back on.

replies(1): >>41909221 #

16. TZubiri ◴[21 Oct 24 22:26 UTC] No.41909221{9}[source]▶

>>41908483 #

An archive of an archive

↑