Most active commenters

TZubiri(19)
stavros(8)
tourmalinetaco(8)
ranger_danger(6)
defrost(5)
card_zero(5)
diggan(4)
trompetenaccoun(4)
AlienRobot(4)
Aachen(4)

Popular/hot comments

>>41900958 #
>>41896411 #
>>41897746 #
>>41896653 #
>>41900368 #
>>41896389 #
>>41897960 #
>>41898015 #
>>41898680 #
>>41898902 #
>>41900052 #

←back to thread

Internet Archive breached again through stolen access tokens

(www.bleepingcomputer.com)

1. trompetenaccoun ◴[20 Oct 24 15:33 UTC] No.41895988[source]▶

>>41895764 (OP) #

We need archives built on decentralized storage. Don't get me wrong, I really like and support the work Internet Archive is doing, but preserving history is too important to entrust it solely to singular entities, which means singular points of failure.

replies(19): >>41896170 #>>41896389 #>>41896411 #>>41896420 #>>41897459 #>>41897680 #>>41897913 #>>41898320 #>>41898841 #>>41899160 #>>41899729 #>>41899779 #>>41899999 #>>41900368 #>>41901199 #>>41902340 #>>41904676 #>>41905019 #>>41907926 #

2. oytis ◴[20 Oct 24 15:53 UTC] No.41896170[source]▶

>>41895988 (TP) #

We'll need to find even more people willing to expose themselves to legal threats and cyberattacks then.

replies(2): >>41897067 #>>41899170 #

3. MattPalmer1086 ◴[20 Oct 24 16:18 UTC] No.41896389[source]▶

>>41895988 (TP) #

Lots of Copies Keeps Stuff Safe

https://www.lockss.org/

This is a brilliant system relying on a randomised consensus protocol. I wanted to do my info sec dissertation on it, but its security model is extremely well thought out. There wasn't anything I felt I could add to it.

replies(3): >>41897330 #>>41897571 #>>41898900 #

4. jdiff ◴[20 Oct 24 16:19 UTC] No.41896411[source]▶

>>41895988 (TP) #

This seems to get brought at least once in the comments for every one of these articles that pops up.

The IA has tried distributing their stores, but nowhere near enough people actually put their storage where their mouths are.

replies(6): >>41896653 #>>41897206 #>>41897450 #>>41897685 #>>41900958 #>>41905113 #

5. sksxihve ◴[20 Oct 24 16:20 UTC] No.41896420[source]▶

>>41895988 (TP) #

There's no real financial incentive for people to archive the data as a singular entity so even less for a distributed collection. Also it's probably easier to fund a single entity sufficiently so they can have security/code audits than a bunch of entities all trying to work together.

replies(1): >>41896664 #

6. immibis ◴[20 Oct 24 16:47 UTC] No.41896653[source]▶

>>41896411 #

Keep in mind the IA archives a lot of garbage. If it could be more focused it would be more likely to work.

replies(4): >>41896718 #>>41897878 #>>41898025 #>>41898121 #

7. riiii ◴[20 Oct 24 16:48 UTC] No.41896664[source]▶

>>41896420 #

Some people are motivated by more than just financial incentive.

replies(1): >>41896894 #

8. db48x ◴[20 Oct 24 16:53 UTC] No.41896718{3}[source]▶

>>41896653 #

The attempts have actually been focused on specific types of content, such as historical videos.

9. sksxihve ◴[20 Oct 24 17:15 UTC] No.41896894{3}[source]▶

>>41896664 #

That's true, but something like archiving the internet is very costly, IA has an annual budget in the tens of millions.

replies(2): >>41897103 #>>41898647 #

10. trompetenaccoun ◴[20 Oct 24 17:40 UTC] No.41897067[source]▶

>>41896170 #

The legal side is a big issue, true. The simplest and best workaround that I'm aware of is how the Arweave network handles it. They leave it up to the individual what parts of the data they want to host, but they're financially incentivized to take on rare data that others aren't hosting, because the rarer it is the more they get rewarded. Since it's decentralized and globally distributed, if something is risky to host in one jurisdiction, people in another can take that job and vice versa. The data also can not be altered after it's uploaded, and that's verifiable through hashes and sampling. Main downside in its current form is that decentralized storage isn't as fast as having central servers. And the experience can vary of course, depending on the host you connect to.

As for technical attacks, I'm not an expert but I'd assume it's more difficult for bad actors to bring down decentralized networks. Has the BitTorrent network ever gone offline because it was hacked for example? That seems like it would be extremely hard to do, not even the movie industry managed to take them down.

replies(2): >>41899150 #>>41899266 #

11. trompetenaccoun ◴[20 Oct 24 17:44 UTC] No.41897103{4}[source]▶

>>41896894 #

Yes, it's a good point. Though they could take that money and reward people for hosting the data as well, couldn't they? They don't have to be in charge of hosting.

replies(2): >>41897289 #>>41897399 #

12. WarOnPrivacy ◴[20 Oct 24 18:00 UTC] No.41897206[source]▶

>>41896411 #

> nowhere near enough people actually put their storage where their mouths are.

Typically because most people who have the upload, don't know that they can. And if they come to the notion on their own, they won't know how.

If they put the notion to a search engine, the keywords they come up with probably don't return the needed ELI5 page.

As in: How do I [?] for the Internet Archive?, most folks won't know what [?] needs to be.

replies(1): >>41897342 #

13. ◴[20 Oct 24 18:10 UTC] No.41897289{5}[source]▶

>>41897103 #

14. TZubiri ◴[20 Oct 24 18:16 UTC] No.41897330[source]▶

>>41896389 #

High Costs Makes Lots of Copies Unfeasible

replies(1): >>41897410 #

15. TZubiri ◴[20 Oct 24 18:16 UTC] No.41897342{3}[source]▶

>>41897206 #

This is literally torrents. Just give up

replies(2): >>41897746 #>>41898888 #

16. sksxihve ◴[20 Oct 24 18:23 UTC] No.41897399{5}[source]▶

>>41897103 #

Yes, they could, that's not much different than a single company distributing the archive to multiple storage centers though. My original comment was about it being more cost effective for a single company to do that than coordinating with a bunch of disjoint entities.

replies(1): >>41897992 #

17. MattPalmer1086 ◴[20 Oct 24 18:24 UTC] No.41897410{3}[source]▶

>>41897330 #

That was actually one of the key constraints in the LOCKSS system, since it was designed to be run by libraries that don't have big budgets.

The design is really very good.

18. creer ◴[20 Oct 24 18:30 UTC] No.41897450[source]▶

>>41896411 #

And it's guaranteed not to happen if the efforts don't continue.

replies(1): >>41898545 #

19. __MatrixMan__ ◴[20 Oct 24 18:31 UTC] No.41897459[source]▶

>>41895988 (TP) #

To make the web distributed-archive-friendly I think we need to start referencing things by hash and not by a path which some server has implied it will serve consistently but which actually shows you different data at different times for a million different reasons.

If different data always gets a different reference, it's easy to know if you have enough backups of it. If the same name gets you a pile of snapshots taken under different conditions, it's hard to be sure which of those are the thing that we'd want to back up for that particular name.

replies(2): >>41897960 #>>41899538 #

20. ChadNauseam ◴[20 Oct 24 18:46 UTC] No.41897571[source]▶

>>41896389 #

I wish IPFS wasn't so wasteful with respect to storage. I tried pinning a 200mb PDF on IPFS and doing so ended up taking almost a gigabyte of disk space altogether. It's also relatively slow. However its implementation of global deduplication is super cool – it means that I can host 5 pages and you can host 50, and any overlap between them means we can both help one another keep them available even if we don't know about one another beforehand.

For a large-scale archival project, it might not be ideal. Maybe something based on erasure coding would be better. Do you know how LOCKSS compares?

replies(1): >>41898547 #

21. TechSquidTV ◴[20 Oct 24 19:01 UTC] No.41897680[source]▶

>>41895988 (TP) #

This has really shown that the be true. I am stuck in a situation right now where I have some lost media I want to upload but they have been down for over a week. I plan to create a torrent in the meantime but that means relying on my personal network connection for the vast majority of downloads up front. I looked into CloudFlare R2, not terrible but not free either.

I was looking into using R2 as a web seed for the torrent but I don't _really_ want to spend much to upload content that is going to get "stolen" and reuploaded by content farms anyway you know?

replies(1): >>41899083 #

22. zelphirkalt ◴[20 Oct 24 19:02 UTC] No.41897685[source]▶

>>41896411 #

Perhaps one idea is to let people choose what they want to protect. This way people wanting to support it can have their mission.

replies(2): >>41897959 #>>41898049 #

23. briandear ◴[20 Oct 24 19:11 UTC] No.41897746{4}[source]▶

>>41897342 #

The problem with torrents is they have a bad reputation since people use it to steal and redistribute other people’s content without their consent.

replies(6): >>41897984 #>>41898015 #>>41898634 #>>41898817 #>>41898984 #>>41899563 #

24. Blackthorn ◴[20 Oct 24 19:31 UTC] No.41897878{3}[source]▶

>>41896653 #

The IA only works because it archives everything. You don't know what you need until you need it.

25. Cheer2171 ◴[20 Oct 24 19:36 UTC] No.41897913[source]▶

>>41895988 (TP) #

You say this as if the IA is not already deeply invested in the DWeb movement. If you go to a DWeb event in the Bay Area, there is a good chance it will be held at the IA.

26. card_zero ◴[20 Oct 24 19:42 UTC] No.41897959{3}[source]▶

>>41897685 #

I want it to protect all sorts of random obscure documents, mostly kind of crappy, that I can't predict in advance, so I can pursue my hobby of answering random obscure questions. For instance:

* What is a "bird famine", and did one happen in 1880?

* Did any astrologer ever claim that the constellations "remember" the areas of the sky, and hence zodiac signs, that they belonged to in ancient times before precession shifted them around?

* Who first said "psychology is pulling habits out of rats", and in what context? (That one's on Wikiquote now, but only because I put it there after research on IA.)

Or consider the recently rediscovered Bram Stoker short story. That was found in an actual library, but only because the library kept copies of old Irish newspapers instead of lining cupboards with them.

The necessary documents to answer highly specific questions are very boring, and nobody has any reason to like them.

replies(2): >>41899422 #>>41900601 #

27. Cheer2171 ◴[20 Oct 24 19:42 UTC] No.41897960[source]▶

>>41897459 #

Done. It is called IPFS. The IA already supports it.

https://github.com/internetarchive/dweb-archive/blob/master/...

replies(3): >>41898278 #>>41898354 #>>41898515 #

28. card_zero ◴[20 Oct 24 19:47 UTC] No.41897984{5}[source]▶

>>41897746 #

Is there any form of torrent where you can do a full text search? That, to me, is the more important problem with torrents.

replies(1): >>41898641 #

29. trompetenaccoun ◴[20 Oct 24 19:48 UTC] No.41897992{6}[source]▶

>>41897399 #

Our digital memory shouldn't be in the hands of a small number of organizations in my view. You're right about cost effectiveness. There are pros and cons to both but it's not just external threats that have to be considered.

History has always gotten rewritten throughout time. If you have a giant library it's easier for bad actors to gain influence and alter certain books, or remove them. This isn't just theoretical, under external pressure IA has already removed sites from its archive for copyright and political reasons.

There are also threats that are generally not even considered because they happen with rare frequency, but when they happen they're devastating. The library of Alexandria was burned by Julius Caesar during a war. Likewise, if all your servers are in one country that geographic risk, they can get destroyed in the event of a war or such. No one expects this to happen today in the US, but archives should be robust long term, for decades, ideally even centuries.

replies(1): >>41898868 #

30. AlienRobot ◴[20 Oct 24 19:51 UTC] No.41898015{5}[source]▶

>>41897746 #

Give it a good reputation then.

What are some legal torrent trackers?

replies(3): >>41898194 #>>41898642 #>>41898870 #

31. Spooky23 ◴[20 Oct 24 19:53 UTC] No.41898025{3}[source]▶

>>41896653 #

Archives generally purposefully don’t have a strong editorial streak. My trash is your treasure.

replies(1): >>41902996 #

32. dawnerd ◴[20 Oct 24 19:57 UTC] No.41898049{3}[source]▶

>>41897685 #

You already can, they have torrents for everything.

replies(2): >>41898563 #>>41898902 #

33. unleaded ◴[20 Oct 24 20:06 UTC] No.41898121{3}[source]▶

>>41896653 #

personally I love all the random crap on IA!

34. unleaded ◴[20 Oct 24 20:18 UTC] No.41898194{6}[source]▶

>>41898015 #

archive.org to name one

replies(1): >>41898369 #

35. __MatrixMan__ ◴[20 Oct 24 20:38 UTC] No.41898278{3}[source]▶

>>41897960 #

Right, what I'm saying is that now we need to get the rest of the web (or at least the parts we want to keep) on board.

36. sschueller ◴[20 Oct 24 20:47 UTC] No.41898320[source]▶

>>41895988 (TP) #

Yes, I was quite shocked when I found out that all their DCs are within driving distance.

replies(1): >>41898450 #

37. majorchord ◴[20 Oct 24 20:51 UTC] No.41898354{3}[source]▶

>>41897960 #

IPFS has shown that the protocol is fundamentally broken at the level of growth they want to achieve and it is already extremely slow as it is. It often takes several minutes to locate a single file.

replies(2): >>41898517 #>>41898578 #

38. boomboomsubban ◴[20 Oct 24 20:53 UTC] No.41898369{7}[source]▶

>>41898194 #

That's debatable. Most of their torrents are for things under copyright, though any other decentralized archive would have the same problem.

replies(1): >>41899007 #

39. Groxx ◴[20 Oct 24 21:12 UTC] No.41898515{3}[source]▶

>>41897960 #

Which has a rather lengthy section explaining why it's currently a failed experiment: https://github.com/internetarchive/dweb-archive/blob/master/...

(this doc is 5-6 years old though, and I'm not sure what may have changed since then)

In my own (toy-scale) IPFS experiments a couple years ago it has been rather usable, but also the software has been utterly insane for operators and users, and if I were IA I would only consider it if I budgeted for a from-scratch rewrite (of the stuff in use). Nearly uncontrollable and unintrospectable and high resource use for no apparent reason.

40. diggan ◴[20 Oct 24 21:12 UTC] No.41898517{4}[source]▶

>>41898354 #

The beauty is that IA could offer their own distribution of IPFS that uses their own DHT for example, and they could allow only public read access to it. This would solve the slow part of finding a file, for IA specifically. Then the actual transfers tend to be pretty quick with IPFS.

What's the point of using IPFS then? Others can still spread the file elsewhere and verify it's the correct one, by using the exact same ID of the file, although on two different networks. The beauty of content-addressing I guess.

replies(1): >>41898568 #

41. acdha ◴[20 Oct 24 21:17 UTC] No.41898545{3}[source]▶

>>41897450 #

You could say the same thing about perpetual motion. Being realistic about why past efforts have failed is key to doing better in the future: for example, people won’t mirror content which could get them in trouble and most people want to feel some kind of benefit or thanks. People should be thinking about how to change dynamics like those rather than burning out volunteers trying more ideas which don’t change the underlying game.

replies(1): >>41900777 #

42. diggan ◴[20 Oct 24 21:17 UTC] No.41898547{3}[source]▶

>>41897571 #

> I tried pinning a 200mb PDF on IPFS and doing so ended up taking almost a gigabyte of disk space altogether

Was that any file in particular? I just tried it myself with a 257mb PDF (as reported by `ls -lrth`) and doesn't seem to add that much overhead:

    $ du -sh ~/.ipfs
    84K     /home/user/.ipfs

    $ ipfs add ~/Downloads/large\ PDF\ File.pdf
    added QmSvbEgCuRNZpkKyQm6nA5vz5RTHW1nxb6MJdR4cZUrnDj large PDF File.pdf
     256.58 MiB / 256.58 MiB [============] 100.00%

    $ du -sh ~/.ipfs
    264M    /home/user/.ipfs

43. diggan ◴[20 Oct 24 21:19 UTC] No.41898563{4}[source]▶

>>41898049 #

> they have torrents for everything

Including the index itself? That would be awesome.

44. acdha ◴[20 Oct 24 21:20 UTC] No.41898568{5}[source]▶

>>41898517 #

That isn’t solving the problem, it’s just giving them more of it to work on. IA has enough material that I’d be surprised if they didn’t hit IPFS’s design limits on their own, and they’d likely need to change the design in ways which would be hard to get upstream.

45. BlueTemplar ◴[20 Oct 24 21:22 UTC] No.41898578{4}[source]▶

>>41898354 #

Several minutes sounds more than fine for this purpose ?

Especially if it's about having an Internet Archive backup.

replies(1): >>41899135 #

46. ranger_danger ◴[20 Oct 24 21:30 UTC] No.41898634{5}[source]▶

>>41897746 #

To me this is like saying you shouldn't use a knife because they are also used by criminals.

replies(1): >>41898690 #

47. TZubiri ◴[20 Oct 24 21:30 UTC] No.41898641{6}[source]▶

>>41897984 #

But internet archive doesn't do this? It's a key based search (url keys)

replies(1): >>41900551 #

48. ranger_danger ◴[20 Oct 24 21:30 UTC] No.41898642{6}[source]▶

>>41898015 #

What is your definition of a legal torrent tracker? I was not aware there were even any illegal ones.

replies(2): >>41898680 #>>41898792 #

49. BlueTemplar ◴[20 Oct 24 21:31 UTC] No.41898647{4}[source]▶

>>41896894 #

So, about $0.01 per person per year ?

We are talking about an (almost) worldwide archive after all.

50. AlienRobot ◴[20 Oct 24 21:36 UTC] No.41898680{7}[source]▶

>>41898642 #

A tracker that only tracks legal torrents, e.g. free software, OCRemix content, etc.

replies(3): >>41898730 #>>41898765 #>>41899921 #

51. John_Cena ◴[20 Oct 24 21:37 UTC] No.41898690{6}[source]▶

>>41898634 #

This kind of talk is simply modern politik-speak. I can't stand it and the people who fall for their deception. Stretch the truth to disarm the constituents

replies(1): >>41899501 #

52. TZubiri ◴[20 Oct 24 21:43 UTC] No.41898730{8}[source]▶

>>41898680 #

How would you keep the definition of legality without a centralizing authority?

replies(1): >>41899432 #

53. boomboomsubban ◴[20 Oct 24 21:49 UTC] No.41898765{8}[source]▶

>>41898680 #

https://linuxtracker.org/ http://www.publicdomaintorrents.info/ https://ocremix.org/torrents

54. mikae1 ◴[20 Oct 24 21:53 UTC] No.41898792{7}[source]▶

>>41898642 #

> I was not aware there were even any illegal ones.

Depends on the jurisdiction. Remember what happened in the The Pirate Bay trial?

replies(1): >>41899902 #

55. thwarted ◴[20 Oct 24 21:57 UTC] No.41898817{5}[source]▶

>>41897746 #

The problem with file transfer is they have a bad reputation since people use it to [insert illegal or immoral activity here].

Then rename it from "torrent" to something else.

replies(1): >>41899049 #

56. delfinom ◴[20 Oct 24 22:02 UTC] No.41898841[source]▶

>>41895988 (TP) #

Yea so, who pays for the decentralized storage long term? What happens when someone storing decentralized data decides to exit? Will data be copied to multiple places, who is going to pay for doubling, tripling or more the storage costs for backups?

Centralized entities emerge to absorb costs because nobody else can do it as efficiently alone.

replies(1): >>41900098 #

57. delfinom ◴[20 Oct 24 22:05 UTC] No.41898868{7}[source]▶

>>41897992 #

>Our digital memory shouldn't be in the hands of a small number of organizations in my view.

I would wager at least 95% of "digital memory" archived is just absolute garbage from SEO spam to just some small websites holding no actual value.

The true digital memory of the world is almost entirely behind the walls of reddit, twitter, facebook, and very few other sites. The internet landscape has changed massively from the 90s and 2000s.

replies(1): >>41902448 #

58. seam_carver ◴[20 Oct 24 22:06 UTC] No.41898870{6}[source]▶

>>41898015 #

Humble Bundle. Various Linux iso

59. WarOnPrivacy ◴[20 Oct 24 22:08 UTC] No.41898888{4}[source]▶

>>41897342 #

> This is literally torrents. Just give up

Most casual visitors to IA don't know that. Which is the point.

Giving up is for others.

60. Kinrany ◴[20 Oct 24 22:09 UTC] No.41898900[source]▶

>>41896389 #

Is there a high level explanation of the model?

61. tourmalinetaco ◴[20 Oct 24 22:10 UTC] No.41898902{4}[source]▶

>>41898049 #

Their torrents suck and IME don’t update to changes in the archive.

replies(3): >>41899505 #>>41899843 #>>41901865 #

62. tourmalinetaco ◴[20 Oct 24 22:23 UTC] No.41898984{5}[source]▶

>>41897746 #

Torrents have a bad reputation due to malicious executables, I have never met someone who genuinely saw piracy as stealing, only as dangerous. In fact, stealing as a definition cannot cover digital piracy, as stealing is to take something away, and to take is to possess something physically. The correct term is copying, because you are duplicating files. And that’s not even getting into the cultural protection piracy affords in today’s DRM and license-filled world.

replies(1): >>41903131 #

63. tourmalinetaco ◴[20 Oct 24 22:27 UTC] No.41899007{8}[source]▶

>>41898369 #

That’s a copyright problem. 99% of things made in the last 100 years fall under copyright.

replies(2): >>41899874 #>>41899925 #

64. TZubiri ◴[20 Oct 24 22:33 UTC] No.41899049{6}[source]▶

>>41898817 #

I'm not sure what the argumentative line is here. But file uploading and downloading needs to have accountability for hosting, which p2p obscures.

The bad reputation is inherent to the tech, not a random quirk.

replies(1): >>41900495 #

65. tourmalinetaco ◴[20 Oct 24 22:40 UTC] No.41899083[source]▶

>>41897680 #

Why not subscribe to a seedbox? They’re about $5/2TB/mo. It protects your IP, you can buy for only the month, and since seedboxes are hosted in DMCA-resistant data centers you can download riskier torrents lightning fast, meaning you’re not just spending money for others, you can get something out of it too.

replies(2): >>41899485 #>>41900002 #

66. Aachen ◴[20 Oct 24 22:52 UTC] No.41899135{5}[source]▶

>>41898578 #

I think the point is that it's already slow at the current amount of data, let alone when you stuff dozens more PB into it

67. Aachen ◴[20 Oct 24 22:54 UTC] No.41899150{3}[source]▶

>>41897067 #

> decentralized storage isn't as fast as having central servers.

With the 30-second "time to first byte" speed we all know and love from IA, I'm pretty sure it'd only get faster when you're the only person accessing an obscure document on a random person's shoebox in Korea as compared to trying to fetch it from a centralised server that has a few thousand other clients to attend to simultaneously

68. NelsonMinar ◴[20 Oct 24 22:55 UTC] No.41899160[source]▶

>>41895988 (TP) #

Is anyone using ArchiveBox regularly? It's a self-hosted archiving solution. Not the ambitious decentralized system I think this comment is thinking of but a practical way for someone to run an archive for themselves. https://archivebox.io/

replies(2): >>41899472 #>>41901607 #

69. Aachen ◴[20 Oct 24 22:58 UTC] No.41899170[source]▶

>>41896170 #

I collect, archive, and host data. Haven't gotten any threats or attacks. Not one. The average r/selfhosted user hiding their personal OwnCloud behind the DDoS maffia seems more afraid than one needs to be even for hosting all sorts of things publicly. I guess this fearmongering comes from tech news about breaches and DDoS attacks on organisations, similar to regular news impacting your regular worldview regardless of how it's actually going in the world or how things personally affect you

replies(1): >>41899894 #

70. jmb99 ◴[20 Oct 24 23:10 UTC] No.41899266{3}[source]▶

>>41897067 #

> decentralized storage isn't as fast as having central servers.

Depending on scale that’s not necessarily true. I find even today there are many services that cannot keep up with my residential fiber connection (3Gbps symmetrical), whereas torrents frequently can. IA in particular is notoriously slow when downloading from their servers, and even taking into account DHT time torrents can be much faster.

Now if all of their PBs of data were cached in a CDN, yeah that’s probably faster than any decentralized solution. But that will take a heck of a lot more money to maintain than I think is possible for IA.

71. oxygen_crisis ◴[20 Oct 24 23:34 UTC] No.41899422{4}[source]▶

>>41897959 #

You could let users choose what to mirror, and one of those choices could be a big bucket of all the least available stuff, for pure preservationists who don't want to focus on particular segments of the data.

Sort of like the bittorrent algorithm that favors retrieving and sharing the least-available chunks if you haven't assigned any priority to certain parts.

72. AlienRobot ◴[20 Oct 24 23:36 UTC] No.41899432{9}[source]▶

>>41898730 #

A tracker is a centralized authority.

replies(1): >>41903150 #

73. bigiain ◴[20 Oct 24 23:44 UTC] No.41899472[source]▶

>>41899160 #

@nikisweeting the dev of archivebox was active in a thread about out here last week.

https://news.ycombinator.com/item?id=41860909

I'd never heard of it, but their responses to question and comments in that thread were really really good (and I now have "install and configure archivebox on the media server" on my upcoming weekend projects list).

74. bigiain ◴[20 Oct 24 23:47 UTC] No.41899485{3}[source]▶

>>41899083 #

Any hints or recommendations on how to find a decent seedbox vendor? (working email in profile if you'd rather not name any in public)

replies(1): >>41900983 #

75. jonhohle ◴[20 Oct 24 23:51 UTC] No.41899501{7}[source]▶

>>41898690 #

In what way? Torrents are used all over for content delivery. Battle.net uses a proprietary version of BitTorrent. It’s now owned by Microsoft. There’s many more legitimate uses as commented by many others.

Criminals using tools does not make the tools criminal.

replies(1): >>41900052 #

76. vundercind ◴[20 Oct 24 23:53 UTC] No.41899505{5}[source]▶

>>41898902 #

This is accurate, their torrent-generating system is basically broken to the point of being useless.

77. jonhohle ◴[21 Oct 24 00:02 UTC] No.41899538[source]▶

>>41897459 #

There was a startup called Space Monkey that sold NAS drives where you got a portion of the space and the rest was used for copies of other people’s content (encrypted). The idea was you could lose your device, plug in a new one and restore from the cloud. They ended up folding before any of their resilience claims could be tested (at least by me).

Would be people be willing to buy an IA box that hosted a shard of random content along with the things they wanted themselves?

replies(2): >>41899985 #>>41900004 #

78. ycombinatrix ◴[21 Oct 24 00:07 UTC] No.41899563{5}[source]▶

>>41897746 #

The problem with websites is they have a bad reputation since people use it to steal and redistribute other people’s content without their consent.

79. johndhi ◴[21 Oct 24 00:43 UTC] No.41899729[source]▶

>>41895988 (TP) #

Ipfs

80. addandsubtract ◴[21 Oct 24 01:09 UTC] No.41899843{5}[source]▶

>>41898902 #

Aren't torrents terrible at handling updates in general? If you want to make a change to the data, or even just add our remove data, you have to create a new torrent and somehow get people to update their torrent and data as well.

replies(1): >>41899975 #

81. trod123 ◴[21 Oct 24 01:17 UTC] No.41899874{9}[source]▶

>>41899007 #

and a good number of things that were going to pass into copyright were further extended to 2053.

82. trod123 ◴[21 Oct 24 01:21 UTC] No.41899894{3}[source]▶

>>41899170 #

Its not a problem until it suddenly is, and by the time it becomes a problem its too late. Its not fear mongering, its risk management and the laws are draconian and fail fundamental basis for a "rule of law", we have a "rule by law".

replies(1): >>41904649 #

83. ranger_danger ◴[21 Oct 24 01:22 UTC] No.41899902{8}[source]▶

>>41898792 #

My understanding is that that court case did not show that operating a torrent tracker is illegal, but specifically operating a (any) service with the explicit intent of violating copyright... huge difference IMO.

To me that's not even related to it being a torrent tracker, just that they were "aiding and abetting" copyright infringement.

replies(1): >>41900035 #

84. ranger_danger ◴[21 Oct 24 01:26 UTC] No.41899921{8}[source]▶

>>41898680 #

I don't see how that would be enforceable. Policy perhaps, but it would be impossible to absolutely prevent it from being used for that purpose IMO.

85. ranger_danger ◴[21 Oct 24 01:27 UTC] No.41899925{9}[source]▶

>>41899007 #

Except when their own employees publicly tell people not to worry about copyright and just upload stuff anyway, they make it their own problem.

86. blurbleblurble ◴[21 Oct 24 01:39 UTC] No.41899975{6}[source]▶

>>41899843 #

There's a mutable torrent extension (BEP-46) but unfortunately I don't think it's widely supported. I think IPFS/IPNS is the more likely direction.

replies(1): >>41900988 #

87. gruez ◴[21 Oct 24 01:41 UTC] No.41899985{3}[source]▶

>>41899538 #

What happens when the user base explodes (eg. due to this event), and a few months layer they all get bored and drop out?

88. patcon ◴[21 Oct 24 01:42 UTC] No.41899999[source]▶

>>41895988 (TP) #

The internet archive shepherded the early https://getdweb.net/ community, and works with groups like IPFS, so they're well aware and offering operational support to decentralized storage projects. This has been going since at least 2016 when I was involved in some projects involving environmental data archiving during the Trump transition

89. gruez ◴[21 Oct 24 01:42 UTC] No.41900002{3}[source]▶

>>41899083 #

2TB of bandwidth or storage?

replies(1): >>41900967 #

90. mbirth ◴[21 Oct 24 01:43 UTC] No.41900004{3}[source]▶

>>41899538 #

Does anyone remember wua.la? It worked similar in that you offered local disk space in exchange for cloud storage. It was later bought by LaCie and killed off shortly after.

91. TZubiri ◴[21 Oct 24 01:49 UTC] No.41900035{9}[source]▶

>>41899902 #

Ok. But what is the case law in hosting illegal content? Sure you may operate a torrent, but if your client is distributing child porn, in my view, you bear responsibility.

replies(2): >>41900161 #>>41900224 #

92. TZubiri ◴[21 Oct 24 01:52 UTC] No.41900052{8}[source]▶

>>41899501 #

It's a matter of numbers, if tens of thousands of criminals use tech X, and it has few genuine uses, it's going to be restricted.

This has precedent in illegal drug categorization, it's not just about the damage, but its ratio of noxious to helpful use.

replies(3): >>41900213 #>>41903245 #>>41909475 #

93. jmb99 ◴[21 Oct 24 02:00 UTC] No.41900098[source]▶

>>41898841 #

At the moment, IA stores everything, and I imagine that most people are picturing a scenario where the decentralized data is in addition to IA's current servers. At least, that's the easiest bootstrapping path.

>What happens when someone storing decentralized data decides to exit?

They exit, and they no longer store decentralized data. At the very least, IA would still have their copy(s), and that data can be spread to other decentralized nodes once it has been determined (through timeouts, etc) that the person has exited.

> Will data be copied to multiple places[...]?

Ideally, yes. It is fairly trivial to determine the reliability of each member (uptime + hash checks), and reliable members (a few nines of uptime and hash matches) can be trusted to store data with fewer copies while unreliable members can store data with more copies. Could also balance that idea with data that's in higher demand, by storing hot data lots of times on less reliable members while storing cold data on more reliable members.

> who pays for the decentralized storage long term? [...] who is going to pay for doubling, tripling or more the storage costs for backups?

This is unanswered for pretty much any decentralized storage project, and is probably the only important question left. There are people who would likely contribute to some degree without a financial incentive, but ideally there would be some sort of reward. This in theory could be a good use for crypto, but I'd be concerned about the possible perverse incentives and the general disdain the average person has for crypto these days. Funding in general could come from donations received by IA, whatever excess they have beyond their operating costs and reserve requirements - likely would be nowhere near enough to make something like this "financially viable" (i.e. profitable) but it might be enough to convince people who were on the fence to chip in few hundred GB and some bandwidth. This is an open question though, and probably the main reason no decentralized storage project has really taken off.

94. ranger_danger ◴[21 Oct 24 02:19 UTC] No.41900161{10}[source]▶

>>41900035 #

I don't think TPB ever hosted any copyrighted content, even indirectly by its users. Torrent peers do not ever send any file contents through the tracker.

95. Kerb_ ◴[21 Oct 24 02:35 UTC] No.41900213{9}[source]▶

>>41900052 #

That precedent was and still is legally used to federally regulate marijuana harsher than fentanyl, a precedent I strongly disagree with, so you'll have to forgive me for believing that the degree to which something causes harm matters more than the amount of misuse

replies(1): >>41904339 #

96. defrost ◴[21 Oct 24 02:38 UTC] No.41900224{10}[source]▶

>>41900035 #

I'm backing ranger_danger here.

In Law the technicalities matter.

Trackers generally do not host any content, just hashcodes and (sometimes) meta data descriptions of content.

If "your" (ie let's say _you_ TZubiri) client is distributing child pornography content because you have a partially downloaded CP file then that's on _you_ and not on the tracker.

The "tracker" has unique hashcode signatures of tens of millions of torrents - it literaly just puts clients (such as the one that you might be running yourself on your machine in the example above) in touch with other clients who are "just asking" about the same unique hashcode signature.

Some tracker affiliated websites (eg: TPB) might host searchable indexes of metadata associated with specific torrents (and still not host the torrents themselves) but "pure" trackers can literally operate with zero knowledge of any content - just arrange handshakes between clients looking for matching hashes - whether that's UbuntuLatest or DonkeyNotKong

replies(1): >>41900486 #

97. stavros ◴[21 Oct 24 03:11 UTC] No.41900368[source]▶

>>41895988 (TP) #

I designed a system where you could say "donate this spare 2 TB of my disk space to the Internet Archive" and the IA would push 2 TB of data to you. This system also has the property that it can be reconstructed if the IA (or whatever provider) goes away.

Unfortunately, when I talked to a few archival teams (including the IA) about whether they'd be interested in using it, I either got no response or a negative one.

replies(4): >>41900972 #>>41902214 #>>41902442 #>>41904379 #

98. TZubiri ◴[21 Oct 24 03:41 UTC] No.41900486{11}[source]▶

>>41900224 #

We agree in that if my client distributes illegal content, I am responsible, at least in part.

On the other hand I also believe that a tracker that hosts hashes of illegal content, provides search facilities for and facilitates their download, is responsible, in a big way. That's my personal opinion and I think it's backed in cases like the pirate bay and sci hub.

That 0 knowledge tracker is interesting, my first reaction is that it's going to end up in very nasty places like Tor, onion, etc..

replies(1): >>41900676 #

99. komali2 ◴[21 Oct 24 03:43 UTC] No.41900495{7}[source]▶

>>41899049 #

It doesn't really, you can host a server off a raw IP.

Downloading from example.com is just peer to peer with someone big. There's lots of hosting providers and DNS providers that are happy to host illegal-in-some-places content.

replies(1): >>41904327 #

100. card_zero ◴[21 Oct 24 03:57 UTC] No.41900551{7}[source]▶

>>41898641 #

Internet archive allows full text search of books, newspapers, etc.. Or anyway it did, before being breached.

replies(1): >>41906380 #

101. deafpolygon ◴[21 Oct 24 04:06 UTC] No.41900601{4}[source]▶

>>41897959 #

My favorite question is: whether or not Bowser took the princess to another castle.

replies(1): >>41900695 #

102. defrost ◴[21 Oct 24 04:22 UTC] No.41900676{12}[source]▶

>>41900486 #

> That 0 knowledge tracker is interesting,

Most actual trackers are zero knowledge.

A tracker (bit of central software that handles 100+ thousand connections/second) is not a "torrent site" such as TPB, EZTV, etc.

A tracker handshakes torrent clients and introduces peers to each other, it has no idea nor needs an idea that "SomeName 1080p DSPN" maps to D23F5C5AAE3D5C361476108C97557F200327718A

All it needs is to store IP addresses that are interested in that hash and to pass handfuls of interested IP addresses to other interested parties (and some other bookkeeping).

From an actual tracker PoV the content is irrelevant and there's no means of telling one thing from another other than size - it's how trackers have operated for 20+ years now.

Here are some actual tracker addresses and ports

    udp://tracker.opentrackr.org:1337/announce
    udp://p4p.arenabg.com:1337/announce
    udp://tracker.torrent.eu.org:451/announce
    udp://tracker.dler.org:6969/announce
    udp://open.stealth.si:80/announce
    udp://ipv4.tracker.harry.lu:80/announce
    https://opentracker.i2p.rocks:443/announce

Here's the bittorrent protocol: http://bittorrent.org/beps/bep_0052.html

Trackers can hand out .torrent files if asked (bencoded dictionaries that describe filenames, sizes, checksums, directory structures of a torrents contents) but they don't have to; mostly they hand out peer lists of other clients .. peers can also answer requests for .torrent files.

A .torrent file isn't enough to determine illegal content.

Pornography can be contained in files labelled "BeautifulSunset.mkv" and Rick Astley parody videos can frequently be found in files labelled "DirtyFilthyRepubicanFootTappingNudeAfrica.avi"

Given that it's not clear how trackers could effectively filter by content that never actually traverses their servers.

replies(2): >>41900814 #>>41906150 #

103. card_zero ◴[21 Oct 24 04:31 UTC] No.41900695{5}[source]▶

>>41900601 #

Since the IA had a collection of emulators (some of them running online*), and old ROMs and floppies and such, it could probably help with that one too.

* Strictly speaking, running in-browser, but that sounded like "Bowser" so I wrote online instead.

104. creer ◴[21 Oct 24 04:50 UTC] No.41900777{4}[source]▶

>>41898545 #

There are certainly research questions and cost questions and practicality and subsetting and whatnot. Addressed by some ideas and not by others.

What there isn't is a currently maintained and advertised client and plan. That I can find. Clunky or not, incomplete or not.

There are other systems that have a rough plan for duplication and local copy and backup. You can easily contribute to them, run them, or make local copies. But not IA. (I mean you can try and cook up your own duplication method. And you can use a personal solution to mirror locally everything you visit and such.) No duplication or backup client or plan. No sister mirrored institution that you might fund. Nothing.

105. TZubiri ◴[21 Oct 24 04:58 UTC] No.41900814{13}[source]▶

>>41900676 #

Oh ok, it seems to be a misconception of mine then.

Mathematically a tracker would offer a function that given a hash, it returns you a list of peers with that file.

While a "torrent site" like TPB or SH, would offer a search mechanism, whereby they would host an index, content hashes and english descriptors, along with a search engine.

A user would then need to first use the "torrent site" to enter their search terms, and find the hash, then they would need to give the hash to a tracker, which would return the list of peers?

Is that right?

In any case, each party in the transaction shares liability. If we were analyzing a drug case or a people trafficking case, each distributor, wholesaler or retailer would bear liability and face criminal charges. A legal defense of the type "I just connected buyers with sellers I never exchanged the drug" would not have much chance of succeding, although it is a common method to obstruct justice by complicating evidence gathering. (One member collects the money, the other gives the drugs.)

replies(1): >>41900954 #

106. defrost ◴[21 Oct 24 05:35 UTC] No.41900954{14}[source]▶

>>41900814 #

> A user would then need to first use the "torrent site" to enter their search terms, and find the hash, then they would need to give the hash to a tracker, which would return the list of peers?

> Is that right?

More or less.

> In any case, each party in the transaction shares liability.

That's exactly right Bob. Just as a telephone exchange shares liability for connecting drug sellers to drug buyers when given a phone number.

Clearly the telephone exchange should know by the number that the parties intend to discuss sharing child pornography rather than public access to free to air documentaries.

How do you propose that a telephone exchange vet phone numbers to ensure drugs are not discussed?

Bear in mind that in the case of a tracker the 'call' is NOT routed through the exchange.

With a proper telephone exchange the call data (voices) pass through the exchange equipment, with a tracker no actual file content passes through the trackers hardware.

The tracker, given a number, tells interested parties about each other .. they then talk directly to each other; be it about The Sky at Night -s2024e07- 2024-10-07 Question Time or about Debbie Does Donkeys.

Also keep in mind that trackers juggle a vast volume of connections of which a very small amount would be (say) child abuse related.

replies(1): >>41905303 #

107. jonny_eh ◴[21 Oct 24 05:35 UTC] No.41900958[source]▶

>>41896411 #

Nearly every entry in the library has a torrent file (which is a distributed storage system), but with the index pages down, they're not accessible.

replies(7): >>41901134 #>>41902088 #>>41902677 #>>41903136 #>>41903263 #>>41903897 #>>41905654 #

108. tourmalinetaco ◴[21 Oct 24 05:37 UTC] No.41900967{4}[source]▶

>>41900002 #

Bandwidth, though some provide multi-TB storage (I assume you pay out the nose however).

109. 4gotunameagain ◴[21 Oct 24 05:38 UTC] No.41900972[source]▶

>>41900368 #

Why reinvent the wheel ?

There are so many proven distributed archiving systems, a lot of which are mentioned in these comments.

replies(1): >>41902360 #

110. tourmalinetaco ◴[21 Oct 24 05:40 UTC] No.41900983{4}[source]▶

>>41899485 #

I’ve only used r/Seedboxes on Reddit, and that’s yet to fail me. The specific one I mentioned is EvoSeedBox’s $5/mo tier with 130GB HDD + 2TB bandwidth which is all I’ve needed so far.

111. tourmalinetaco ◴[21 Oct 24 05:41 UTC] No.41900988{7}[source]▶

>>41899975 #

Which IA has moved into and hasn’t found much luck in, unfortunately.

replies(1): >>41901320 #

112. HappMacDonald ◴[21 Oct 24 06:11 UTC] No.41901134{3}[source]▶

>>41900958 #

They're not using DHT?

replies(1): >>41902418 #

113. rootsudo ◴[21 Oct 24 06:26 UTC] No.41901199[source]▶

>>41895988 (TP) #

I watched a hbo series about this once, I think it was called Pied Piper.

114. jpk ◴[21 Oct 24 06:43 UTC] No.41901320{8}[source]▶

>>41900988 #

How come?

115. jumpingscript ◴[21 Oct 24 07:33 UTC] No.41901607[source]▶

>>41899160 #

I am self-hosting ArchiveBox through yunohost, for the odd blog article I come across and like. Not a heavy user per se, but it's doing its thing reliably.

116. rbanffy ◴[21 Oct 24 08:19 UTC] No.41901865{5}[source]▶

>>41898902 #

Torrents are immutable in principle, which is good for preserving things. A new version of a set of files should be a new torrent.

replies(2): >>41902160 #>>41902743 #

117. highwaylights ◴[21 Oct 24 09:04 UTC] No.41902088{3}[source]▶

>>41900958 #

You're correct, but even then you've still the problem of storage - the torrents are only useful (and there's a lot of them) if a sustainable number of seeds remain available.

replies(1): >>41903470 #

118. zelphirkalt ◴[21 Oct 24 09:15 UTC] No.41902160{6}[source]▶

>>41901865 #

How would preservationists go about automatically updating the torrent and data they seed? Or would they need to manually regularly check, if they are still seeding the up-to-date content?

119. whywhywhywhy ◴[21 Oct 24 09:25 UTC] No.41902214[source]▶

>>41900368 #

Because the incentive will be archiving things they believe should be archived, you need the process to begin with what urls do you want to be archiving, then people will be incentivized for archiving the juicy stuff IA is used for and you just throw some stuff they didn't ask to archive in the remit of them storing what they want.

replies(1): >>41903395 #

120. scirob ◴[21 Oct 24 09:42 UTC] No.41902340[source]▶

>>41895988 (TP) #

LigGen / SciHub are model citizens in this regard. Use the best tech for the job . Torrents + IPFS + simple http mirrors all over . While their data is big its not as big as Archive.org https://libgen.is/repository_torrent/ so I guess one still needs the funding to go to these decentralized nodes and they are there mostly for resiliancy

replies(1): >>41904365 #

121. stavros ◴[21 Oct 24 09:44 UTC] No.41902360{3}[source]▶

>>41900972 #

What system of these allows me to donate a bunch of disk space to a provider of my choosing, without thinking about it afterwards?

122. jdiff ◴[21 Oct 24 09:53 UTC] No.41902418{4}[source]▶

>>41901134 #

They're not talking about peer discovery, they're talking about .torrent file discovery.

123. boramalper ◴[21 Oct 24 09:57 UTC] No.41902442[source]▶

>>41900368 #

Is this open source or do you have any design docs? I love the idea and would love to learn more about it.

replies(1): >>41902462 #

124. sprkwd ◴[21 Oct 24 09:57 UTC] No.41902448{8}[source]▶

>>41898868 #

We are currently in the middle of an information dark age and not many people have realised this yet.

125. stavros ◴[21 Oct 24 10:00 UTC] No.41902462{3}[source]▶

>>41902442 #

The idea is that it'll be open source, I have a rough design doc here:

https://docs.google.com/document/d/1qKgIjUTef-I-BLWjn4sEIbYo...

I'll write up a more detailed article on it, though, it'll be good to at least have the doc public somewhere.

126. johnisgood ◴[21 Oct 24 10:37 UTC] No.41902677{3}[source]▶

>>41900958 #

If we want it to be distributed across laymen, we need something easier than opening torrent files (or inputting magnet URI) over a thousand times. Perhaps https://github.com/ipfs/in-web-browsers?

127. diggan ◴[21 Oct 24 10:48 UTC] No.41902743{6}[source]▶

>>41901865 #

> Torrents are immutable in principle

In practice, that's mostly how they're being used.

But the protocol does support mutation. The BEP describing the behavior even has archive.org as an example...

> The intention is to allow publishers to serve content that might change over time in a more decentralized fashion. Consumers interested in the publisher's content only need to know their public key + optional salt. For instance, entities like Archive.org could publish their database dumps, and benefit from not having to maintain a central HTTP feed server to notify consumers about updates.

http://www.bittorrent.org/beps/bep_0046.html

replies(1): >>41904861 #

128. immibis ◴[21 Oct 24 11:31 UTC] No.41902996{4}[source]▶

>>41898025 #

They have to if they don't want to use infinite space.

129. master-lincoln ◴[21 Oct 24 11:51 UTC] No.41903131{6}[source]▶

>>41898984 #

What does this have to do with torrents? If you get an executable from the internet it is widely known not to execute it if not trusted. You can get malicious executables from websites too.

If this is what people think we need to work on education...

replies(1): >>41906742 #

130. bastawhiz ◴[21 Oct 24 11:52 UTC] No.41903136{3}[source]▶

>>41900958 #

I can hardly find a healthy torrent for an obscure feature film that I care about. How am I supposed to find a healthy torrent for a random web page from the aughts?

replies(1): >>41941803 #

131. master-lincoln ◴[21 Oct 24 11:54 UTC] No.41903150{10}[source]▶

>>41899432 #

But legality doesn't have a central authority. What is illegal in one jurisdiction is ok in another

replies(1): >>41906044 #

132. master-lincoln ◴[21 Oct 24 12:06 UTC] No.41903245{9}[source]▶

>>41900052 #

This precedent is problematic I think. It seems like the populist way of addressing issues. Always just following the biggest outcry instead of the symptoms. Just because there are currently more illegitimate users for a a thing we shouldn't prevent legitimate uses I think. The ratio might just be skewed because in the legitimate world you grow your audience with marketing investing tons of capital, while for illegitimate use cases, the marketing is often just word of mouth because of features.

replies(1): >>41903814 #

133. sumtechguy ◴[21 Oct 24 12:08 UTC] No.41903263{3}[source]▶

>>41900958 #

The problem with their torrents is they are usually broken. Lots of complaints on them being broken. But no one fixing it.

134. stavros ◴[21 Oct 24 12:24 UTC] No.41903395{3}[source]▶

>>41902214 #

I don't think that's necessarily true, I have a spare TB that I'd be glad to donate to the IA to store whatever they want in.

135. Wheatman ◴[21 Oct 24 12:33 UTC] No.41903470{4}[source]▶

>>41902088 #

How abiut torrenting a collection of websites in one collection?

You can distribute less popular websites with more used ones to avoid losing it? And Torrents are good with transfering large files in my experience.

replies(1): >>41904313 #

136. MichaelZuo ◴[21 Oct 24 13:13 UTC] No.41903814{10}[source]▶

>>41903245 #

That's how most countries work, if enough people raise a big enough fuss, it's restricted like dangerous drugs.

137. giancarlostoro ◴[21 Oct 24 13:21 UTC] No.41903897{3}[source]▶

>>41900958 #

Maybe there needs to be a torrentable offline-first HTML file (only goes online to tell you if there's a new torrent whatsoever with more files), that lets you look through for more torrents (Magnet links are really tiny).

I miss when TPB used to have a CSV of all their magnet links, their new UI is trash. I can't even find anything like the old days, pretty much TPB is a dying old relic.

138. baby_souffle ◴[21 Oct 24 14:06 UTC] No.41904313{5}[source]▶

>>41903470 #

> You can distribute less popular websites with more used ones to avoid losing it?

So long as this distributed protocol has the concept of individual files, there _will_ be clients out there that allow the user to select `popular-site.archive.tar.gz` and not `less-popular.tar.gz` for download.

And what one person doesn't download... they can't seed back. Distributed stuff is really good for low cost, high scale distribution of in-demand content. It's _terrible_ for long term reliability/availability, though.

replies(1): >>41904619 #

139. TZubiri ◴[21 Oct 24 14:07 UTC] No.41904327{8}[source]▶

>>41900495 #

Incorrect.

The protocols for downloading from example.com are assymettrical client server architectures, not symmetrical decentralized peer to peer.

140. TZubiri ◴[21 Oct 24 14:08 UTC] No.41904339{10}[source]▶

>>41900213 #

Marihuana ruins millions of young minds.

replies(1): >>41905775 #

141. TZubiri ◴[21 Oct 24 14:10 UTC] No.41904365[source]▶

>>41902340 #

If so, why do they keep jumping from one dubious domain and TLD from another?

replies(1): >>41907204 #

142. TZubiri ◴[21 Oct 24 14:11 UTC] No.41904379[source]▶

>>41900368 #

Hosting something at a volunteer's drive, without any guarantees, is pretty useless. They can cease hosting, or have disk damage, and you lose data

replies(1): >>41904787 #

143. armada651 ◴[21 Oct 24 14:35 UTC] No.41904619{6}[source]▶

>>41904313 #

That is fundamentally the problem, no one wants to donate storage to host stuff they're not interested in.

replies(1): >>41906702 #

144. Aachen ◴[21 Oct 24 14:37 UTC] No.41904649{4}[source]▶

>>41899894 #

And that's how you get preppers: it's a remote problem until it's too late, let's prepare for every eventuality before it's suddenly too late!

Risk management is a balance, not fearmongering as you say. That's why I'd rather use advice from people with daily experience than look at the newsworthy experiences ("nothing happened today, again; regular security patches working fine" you'll never see) and conclude you'd attract threats and cyber attacks just by hosting backup copies of parts of the Internet Archive

145. sersi ◴[21 Oct 24 14:39 UTC] No.41904676[source]▶

>>41895988 (TP) #

If there was a way, I could dedicate 10 TB to IA and have the system automatically use those 10TB to store whatever is best I'd love that.

Right now there are torrents and I do keep any torrents I download from IA in my client for years but torrents means I only get to contribute by sharing the things I downloaded in the past.

146. stavros ◴[21 Oct 24 14:48 UTC] No.41904787{3}[source]▶

>>41904379 #

You're essentially saying that having an extra copy of data is equally as reliable as not having an extra copy of data. I would encourage you to think about this a bit more.

replies(1): >>41905250 #

147. account42 ◴[21 Oct 24 14:54 UTC] No.41904861{7}[source]▶

>>41902743 #

Specs are nice but does any client actually implement this?

148. ◴[21 Oct 24 15:08 UTC] No.41905019[source]▶

>>41895988 (TP) #

149. rolandog ◴[21 Oct 24 15:17 UTC] No.41905113[source]▶

>>41896411 #

Perhaps a naïve question, but hasn't this problem been solved by the FreeNet Project (now HyphaNet) [0]? (the re-write — current FreeNet — was previously called Locutus, IIRC [1]).

Side note: As an outsider, and someone who hasn't tried either version of FreeNet in more than almost 2 decades, was this kind of a schism like the Python 2 vs. Python 3 kerfuffle? Is there more to it?

[0]: https://www.hyphanet.org/

[1]: https://freenet.org/

replies(1): >>41907837 #

150. TZubiri ◴[21 Oct 24 15:30 UTC] No.41905250{4}[source]▶

>>41904787 #

For these parameters, yes.

If you have a raid, then you have 2 copies with like 99.99% availability and 5 mean time years to failure.

With a volunteer drive you have like ?% availability and ?% years to failure? You can't depend on it.

Also the average value of data is very low, you don't want to be making many copies of for no reason.

replies(1): >>41905351 #

151. TZubiri ◴[21 Oct 24 15:36 UTC] No.41905303{15}[source]▶

>>41900954 #

Interesting. That's a good point.

I'll restate the principle of good usage to bad usage ratio, telephone providers are a well established service with millions of legitimate users and uses. Furthermore they are a recognized service in law, they are regulated, and they can comply with law enforcement.

They are closer to the ISP, which according to my theory has some liability as well.

It's just a matter of the liability being small and the service to society being useful and necessary.

To take a spin to a similar but newer tech, consider crypto. My position is that its legality and liability for illegal usage of users (considering that of exchanges and online wallets, since the network is often not a legal entity) will depend on the ratio of legitimate to ilegitimate use that will be given to it.

There's definitely a second system effect, were undesirables go to the second system, so it might be a semantical difference unrelated to the technical protocols. Maybe if one system came first, or if by chance it were the most popular, the tables would be turned.

But I feel more strongly that there's design features that make law compliance, traceability and accountability difficult. In the case of trackers perhaps the microservice/object is a simple key-value store, but it is semantically associated with other protocols which have 'noxious' features described above AND are semantically associates with illegal material.

replies(1): >>41910135 #

152. stavros ◴[21 Oct 24 15:41 UTC] No.41905351{5}[source]▶

>>41905250 #

That would mean that even with a million volunteer drives storing a file, you still wouldn't be able to depend on them, which is plainly wrong.

> Also the average value of data is very low, you don't want to be making many copies of for no reason.

The reason is that the value of that data is high to the archivist, since they want to preserve it.

replies(1): >>41906362 #

153. Capricorn2481 ◴[21 Oct 24 16:13 UTC] No.41905654{3}[source]▶

>>41900958 #

It's not abnormal for their torrents to be missing most of their direct downloads on the same page

154. ◴[21 Oct 24 16:27 UTC] No.41905775{11}[source]▶

>>41904339 #

155. AlienRobot ◴[21 Oct 24 16:57 UTC] No.41906044{11}[source]▶

>>41903150 #

Just track things that are legal everywhere or in most jurisdictions then.

156. GoblinSlayer ◴[21 Oct 24 17:10 UTC] No.41906150{13}[source]▶

>>41900676 #

Are you sure open.stealth.si is a zero knowledge tracker? Some trackers reject unregistered torrents.

replies(1): >>41910175 #

157. TZubiri ◴[21 Oct 24 17:34 UTC] No.41906362{6}[source]▶

>>41905351 #

A million is out of the parameters of the case.

Realistically you won't get enough volunteer-storage to cover one IA. And even if you did, it wouldn't satisfy the mission requirements, which is to store reliably for decades all of the data.

replies(1): >>41906413 #

158. TZubiri ◴[21 Oct 24 17:36 UTC] No.41906380{8}[source]▶

>>41900551 #

It does transcribe books (through imperfect OCR) so I guess that's possible. Never relied on it as I search by title and author.

But anyways not the case for the wayback product which is the unique core to IA.

replies(1): >>41907315 #

159. stavros ◴[21 Oct 24 17:39 UTC] No.41906413{7}[source]▶

>>41906362 #

This isn't meant to be storage for IA, it's meant to be a distributed backup.

replies(1): >>41907552 #

160. mrguyorama ◴[21 Oct 24 18:06 UTC] No.41906702{7}[source]▶

>>41904619 #

More concretely, nobody wants to donate anything. They just want it to exist. Charity has never been a functional solution to normal coordination problems. We have centuries of evidence of this.

161. tourmalinetaco ◴[21 Oct 24 18:10 UTC] No.41906742{7}[source]▶

>>41903131 #

Piracy also is not unique to torrents, and yet that was what GP used.

The average person, in my experience, can barely work a non-cellphone filesystem and actively stresses when a terminal is in front of them, especially for a brief moment. Education went out the window a decade ago.

162. skeaker ◴[21 Oct 24 18:56 UTC] No.41907204{3}[source]▶

>>41904365 #

That's mentioned in the comment you're replying to as "simple HTTP mirrors." Those are vulnerable to takedowns but are kept up for the sake of easy accessibility to the public that doesn't know how to access any of the other options. Their availability has no real bearing on the availability of the other options beyond the tech savviness of the general public.

163. card_zero ◴[21 Oct 24 19:07 UTC] No.41907315{9}[source]▶

>>41906380 #

That's not unique, not a product, and not the part I use most.

Well, OK, maybe other webpage archives don't work as well, I haven't tried them, but there are others. And they're newer, so don't have such extensive historical pages.

Large numbers of Wikipedia references (which relied on IA to prevent link rot) must be completely broken now.

164. TZubiri ◴[21 Oct 24 19:28 UTC] No.41907552{8}[source]▶

>>41906413 #

Ah my bad, so it's not a replacement of IA. In that case it makes sense

replies(1): >>41908483 #

165. sanity ◴[21 Oct 24 19:52 UTC] No.41907837{3}[source]▶

>>41905113 #

Hi, Freenet's FAQ explains the renaming/rebranding here: [1]

Neither version of Freenet is designed for long-term archiving of large amounts of data so it probably isn't ideally suited to replacing archive.org, but we are planning to build decentralized alternatives to services like wikipedia on top of Freenet.

[1] https://freenet.org/faq/#why-was-freenet-rearchitected-and-r...

replies(1): >>41930187 #

166. hacknewslogin ◴[21 Oct 24 20:00 UTC] No.41907926[source]▶

>>41895988 (TP) #

This sounds like a task that could be taken care of by public libraries. That, and supporting Tor. I just don't see it happening anytime soon, at least in the US.

167. stavros ◴[21 Oct 24 21:00 UTC] No.41908483{9}[source]▶

>>41907552 #

Yes, the idea is that this is a replacement for the torrents they make public. In case the IA goes away, we'll have this distributed dataset to fall back on.

replies(1): >>41909221 #

168. TZubiri ◴[21 Oct 24 22:26 UTC] No.41909221{10}[source]▶

>>41908483 #

An archive of an archive

169. jonhohle ◴[21 Oct 24 23:04 UTC] No.41909475{9}[source]▶

>>41900052 #

Literally millions of people use it (whether they know it or not).

Societies should criminalize behavior and then (shocker!) enforce the laws! Let tools be tools.

170. defrost ◴[22 Oct 24 00:54 UTC] No.41910135{16}[source]▶

>>41905303 #

> I'll restate the principle of good usage to bad usage ratio, telephone providers are a well established service with millions of legitimate users and uses

Ditto trackers.

Have a look at the graphs here: https://opentrackr.org/

Over 10 million torrents tracked daily, on the order of 300 thousand connections per second, handshaking between some 200 million peers per week.

That's material from the Internet Archive, software releases, pooled filesharing, legitimate content sharing via embedded clients that use torrents to share load, and a lot of TV and movies that have variable copyright status

( One of the largest TV|Movie sharing sites for decades recent closed down after the sole operator stopped bearing the cost and didn't want to take on dubious revenue sources; that was housed in a country that had no copyright agreements with the US or UK and was entirely legal on its home soil.

Another "club" MVGroup only rip documentaries that are "free to air" in the US, the UK, Japan, Australia, etc. and in 20 years of publicaly sharing publicaly funded content haven't had any real issues )

> the ISP, which according to my theory has some liability as well.

The world's a big place.

The US MPA (Motion Picture Association - the big five) backed an Australian mini-me group AFACT (Australian Federation Against Copyright Theft) to establish ISP liability in a G20 country as a beach head bit of legislation.

That did not go well: Roadshow Films Pty Ltd v iiNet Ltd decided in the High Court of Australia (2012) https://en.wikipedia.org/wiki/Roadshow_Films_Pty_Ltd_v_iiNet...

    The alliance of 34 companies unsuccessfully claimed that iiNet authorised primary copyright infringement by failing to take reasonable steps to prevent its customers from downloading and sharing infringing copies of films and television programs using BitTorrent.

That was a three strikes total face plant:

    The trial court delivered judgment on 4 February 2010, dismissing the application and awarding costs to iiNet.

    An appeal to the Full Court of the Federal Court was dismissed.

    A subsequent appeal to the High Court was unanimously dismissed on 20 April 2012.

It set a legal precedent:

    This case is important in copyright law of Australia because it tests copyright law changes required in the Australia–United States Free Trade Agreement, and set a precedent for future law suits about the responsibility of Australian Internet service providers with regards to copyright infringement via their services.

It's also now part of Crown Law .. ie. not directly part of the core British Law body, but a recognised bit of Commonwealth High Court Law that can be referenced for consideration in the UK, Canada, etc.

> but it is semantically associated with other protocols which have 'noxious' features described above AND are semantically associates with illegal material.

Gosh, semantics hey. Some people feel in their waters that this is a protocol used by criminals and must therefore by banned or policed into non existance?

Is that a legal argument?

171. defrost ◴[22 Oct 24 01:01 UTC] No.41910175{14}[source]▶

>>41906150 #

The list I gave was of some public trackers, I made no claim that they were zero knowledge trackers, I simply made a statement that trackers needn't be aware of .torrent file manifests in order to share peer lists.

I also indicated above that having knowledge of .torrent manifests is problematic as that doesn't provide real actual knowledge of file contents just knowledge of file names ... LatestActionMovie.mkv might be a rootkit virus and HappyBunnyRabbits.avi might be the worst most exploitative underage pornography you can think of.

Some trackers are also private and require membership keys to access.

I was skating a lot as TZubiri seems unaware of many of the actual details and legitimate use cases, existing law, etc.

172. rolandog ◴[23 Oct 24 23:12 UTC] No.41930187{4}[source]▶

>>41907837 #

Thanks for pointing it out and for correcting me!

173. anacrolix ◴[25 Oct 24 02:47 UTC] No.41941803{4}[source]▶

>>41903136 #

https://coveapp.info

↑