Internet Archive breached again through stolen access tokens

1. myself248 ◴[20 Oct 24 15:39 UTC] No.41896048[source]▶

I'd like to imagine a world where every lawyer, when their case is helped by a Wayback Machine snapshot of something, flips a few bucks to IA. They could afford a world-class admin team in no time flat.

replies(2): >>41896197 #>>41897663 #

2. thaumasiotes ◴[20 Oct 24 15:57 UTC] No.41896197[source]▶

>>41896048 (TP) #

That's a terrible solution. The Wayback Machine takes down their snapshots at the request of whoever controls the domain. That's not archival.

If the state of a webpage in the past matters to you, you need a record that won't cease to exist when your opposition asks it to. This is the concept behind perma.cc.

replies(3): >>41896261 #>>41896697 #>>41896848 #

3. myself248 ◴[20 Oct 24 16:04 UTC] No.41896261[source]▶

>>41896197 #

Ooo, excellent. Yes, hiding items is imperfect, but I understood that it was legally required or something. (IANAL and IDFK, TBH) I wonder how perma.cc gets around that.

replies(2): >>41896643 #>>41896764 #

4. immibis ◴[20 Oct 24 16:46 UTC] No.41896643{3}[source]▶

>>41896261 #

Most likely by breaking the law.

5. db48x ◴[20 Oct 24 16:51 UTC] No.41896697[source]▶

>>41896197 #

No, they don’t delete the archived content. When the domain’s robots.txt file bans spidering, then the Wayback Machine _hides_ the content archived at that domain. It is still stored and maintained, but it isn’t distributed via the website. The content will be unhidden if the robots.txt file stops banning spiders, or if an appropriate request is made.

replies(6): >>41896874 #>>41896927 #>>41896931 #>>41900009 #>>41902646 #>>41903368 #

6. berdario ◴[20 Oct 24 17:00 UTC] No.41896764{3}[source]▶

>>41896261 #

I'm afraid that it just hasn't been tested in court yet.

I haven't read this paper yet, but...

https://www.tesble.com/10.1080/0270319x.2021.1886785

from the abstract:

> The article concludes that Perma.cc's archival use is neither firmly grounded in existing fair use nor library exemptions; that Perma.cc, its "registrar" library, institutional affiliates, and its contributors have some (at least theoretical) exposure to risk

It seems that the article is about copyright, but of course there are several other reasons that might justify takedown of content stored on perma.cc:

- Right to be forgotten... perma.cc might be able to ignore it, but could this lead to perma.cc being blocked by european ISPs

- ITAR stuff

- content published by entities recognized by $GOVERNMENT as terrorist organizations

- revenge porn

- CSAM

replies(1): >>41907558 #

7. speerer ◴[20 Oct 24 17:09 UTC] No.41896848[source]▶

>>41896197 #

That's correct, but only for present evidence - what about the past evidence, that you didn't know you needed until it was too late? IA is broad enough to cover the past five times out of ten.

8. speerer ◴[20 Oct 24 17:12 UTC] No.41896874{3}[source]▶

>>41896697 #

In some cases they do appear to delete, on request.

edit: "Other types of removal requests may also be sent to info@archive.org. Please provide as clear an explanation as possible as to what you are requesting be removed for us to better understand your reason for making the request.", https://help.archive.org/help/how-do-i-request-to-remove-som...

replies(1): >>41897510 #

9. ◴[20 Oct 24 17:20 UTC] No.41896927{3}[source]▶

>>41896697 #

10. Raed667 ◴[20 Oct 24 17:21 UTC] No.41896931{3}[source]▶

>>41896697 #

They do delete entire domains from the archive upon request & proof of ownership.

replies(1): >>41897513 #

11. db48x ◴[20 Oct 24 18:39 UTC] No.41897510{4}[source]▶

>>41896874 #

Nope. Nothing is deleted, just hidden.

replies(1): >>41897575 #

12. db48x ◴[20 Oct 24 18:39 UTC] No.41897513{4}[source]▶

>>41896931 #

Again, no they don’t. They just hide them.

13. rascul ◴[20 Oct 24 18:47 UTC] No.41897575{5}[source]▶

>>41897510 #

How do you know?

replies(1): >>41897605 #

14. db48x ◴[20 Oct 24 18:51 UTC] No.41897605{6}[source]▶

>>41897575 #

I worked there for a short while.

replies(1): >>41897920 #

15. ◴[20 Oct 24 18:59 UTC] No.41897663[source]▶

>>41896048 (TP) #

16. bombcar ◴[20 Oct 24 19:38 UTC] No.41897920{7}[source]▶

>>41897605 #

So if the Internet Archive accidentally archived child porn, they wouldn’t delete it?

I suspect they DO delete some things.

replies(1): >>41899502 #

17. db48x ◴[20 Oct 24 23:52 UTC] No.41899502{8}[source]▶

>>41897920 #

Don't be asinine; of course there are exceptions. But the general rule is that nothing is deleted. Even if you have a fancy expensive lawyer send them a C&D letter asking them to delete something or else, they’ll just hide it. You can’t tell the difference from the outside. In fact there are monitoring alarms that are triggered if something _is_ deleted.

replies(1): >>41899983 #

18. thimabi ◴[21 Oct 24 01:41 UTC] No.41899983{9}[source]▶

>>41899502 #

Claiming to have deleted something while just having hidden from public view… that’s basically begging content owners to sue and very easily win damages.

replies(1): >>41900358 #

19. null0pointer ◴[21 Oct 24 01:44 UTC] No.41900009{3}[source]▶

>>41896697 #

What’s the reasoning behind hiding content upon request? Doesn’t that defeat the purpose of archival?

My intuition would say there are 3 cases when content ceases to become available at the original site:

- The host becomes unable to host the content for some reason (bankruptcy, death, etc.) in which case I assume the archive persists.

- The host is externally required to remove the content (copyright, etc.) in which case I assume IA would face the same external pressure? But I’m not sure on that.

- The host/owner has a change of heart about publishing the content. This borders more on IA acting as reputation management on the part of the original host/owner. Personally I think this is hardest to defend but also probably the least common case. In this case I’d think it’s most often to hide something the original host doesn’t want the public finding out later, but that also seems to make it more valuable to be publicly available in the archive. Plus, from a historian/journalist perspective, it’s valuable to be able to track how things change over time, and hiding this from the public prevents that. Though to be honest I’m kind of in two minds here because on the other hand I’m generally of the opinion that people can grow and change, and we shouldn’t hold people to account for opinions they published a decade ago, for example. I’m also generally in favor of the right to be forgotten.

Would appreciate your thoughts here.

replies(1): >>41900308 #

20. db48x ◴[21 Oct 24 02:59 UTC] No.41900308{4}[source]▶

>>41900009 #

It’s all about copyright. Copyright law in the US gives a monopoly on distribution of copies of things (hand‐waving because the definitions are hard, basically artistic works) to their author. Of course authors usually delegate that right to their publisher for practical and financial reasons. There are some fair use exceptions, but this basically makes it illegal for anyone else to make and distribute copies of the author’s work. Again, hand‐waving because I don't want to have to write a dissertation.

When IA shows you what a website looked like in the past, they are reproducing a copyrighted work and distributing it to you. In some cases, perhaps many, this is fair use. IA cannot really know ahead of time which viewers would be exercising their fair use rights and which would not. Instead, IA just makes everything available without trying to guess whether the access would fall under fair use or not. That means that many times, possibly most of the time, IA is technically breaking the law by illegally distributing copies of copyrighted works.

But _owning_ a copy of a copyrighted work is never prohibited by copyright. It doesn’t matter how you got the copy either.

Therefore, pretty much any time someone asks for something to be hidden or removed on copyright grounds, they go ahead and hide it. They don’t bother to delete it though, because copyright doesn’t require them to. If a copyright holder asks for it to be deleted then they are overreaching, and should know that any sane person would object. But as far as I am aware IA doesn’t actually bother to object in writing; they just hide the content and move on.

This means that researchers can visit the archive in person and request permission to see those copies. For example if you are studying the history of artistic techniques in video games using emulated software on IA, you might eventually notice that all the games from one major publisher are missing (except iirc the original Donkey Kong, because they don’t actually own the copyright on that one). You could then journey to the Archive in person to see the missing material and fill in the gaps in your history. Or you could just ignore them entirely out of spite. This is no different than viewing rare books held by any library, or viewing unexhibited artifacts held by a museum, etc

replies(1): >>41901967 #

21. db48x ◴[21 Oct 24 03:10 UTC] No.41900358{10}[source]▶

>>41899983 #

Copyright only regulates the distribution of copies of copyrighted works. Possessing copies and distributing copies to other people are two different things.

If you were photocopying a textbook and giving it to your classmates, the publisher could have their lawyer send you a Cease and Desist letter telling you to stop (or else). But if they told you to burn your copy of the textbook then they would be overreaching, and everyone would laugh at them when you took that story to the papers.

Legal reasoning from made‐up examples is generally a bad idea, but I think you can safely reason from that one.

I’m not privy to the actual communications in these cases, but I suspect that instead of replying back with “we deleted the content from the Archive”, they instead say something anodyne like “the content is no longer available via the Wayback Machine”. Smart lawyers will notice the difference, but then a smart lawyer wouldn’t have expected anything else.

replies(2): >>41902059 #>>41903264 #

22. null0pointer ◴[21 Oct 24 08:42 UTC] No.41901967{5}[source]▶

>>41900308 #

Thanks for the detailed response, very informative. This sounds similar to DMCA takedown requests, though I’m not knowledgable enough to know the distinction. It’s a shame that to view hidden archives one needs to visit the archive in person, but I guess if IA were to respond to email requests for such archives they would be guilty of breaking the same distribution rule. The major difference between the rare books or museum examples and content on IA is that the digital artifacts are infinitely reproducible and transportable so the physical visit required to view them seems totally unnecessary on its face.

It’s a shame that to be able to run an above-board _Internet_ Archive one needs to bend to the whim of anachronistic copyright law and forego all the benefits of the internet in the first place. This seems like it would inevitably mean that any _internet_ archive that is truly accessible over the _internet_ would be forced to operate illegally in a similar manner to SciHub.

I know I hold a rather strong opinion regarding copyright law (I’m not looking to debate it here as I know others hold different opinions which is totally fine), but IMHO copyright law has been a major blight on humanity at large and especially the internet. Major reform is in order at the very least, if not total abolishment.

replies(1): >>41909636 #

23. thaumasiotes ◴[21 Oct 24 08:58 UTC] No.41902059{11}[source]▶

>>41900358 #

> Legal reasoning from made‐up examples is generally a bad idea

What? That's the only way to do legal reasoning, and as an obvious consequence it's how both lawyers and judges do it.

replies(1): >>41902680 #

24. DoctorOetker ◴[21 Oct 24 10:32 UTC] No.41902646{3}[source]▶

>>41896697 #

That distinction becomes nearly moot in lots of cases:

* it prevents victims from performing discovery (gathering evidence) before starting a trial or confiding to an expensive lawyer whose loyalty may turn out to systematically lie with the perpetrators or highest bidders.

* it prevents people who requested a snapshot (and thus know a specific URL with relevant knowledge) from proving their version of events to acquaintances, say during or after a court case in the event their lawyers just spin a random story instead of submitting the evidence as requested, since disloyal lawyer will have informed counterparty and counterparties will have requested "removal" of the page at IA, resulting in psychological isolation of the victim since victim can no longer point to the pages with direct and or supporting evidence.

Anyone with even basic understanding of cryptographic hashes and signatures would understand that:

1) for a tech-savvy entity (which an internet archival entity automatically is expected to be)

2) in the face changing norms and values (regardless of static or changing laws: throughout history violations were systematically turned a blind eye to)

3) given the shameless nature of certain entities, self-describing their criminal behavior on their commercial webpages

Any person understanding above 3 points concludes that such an archival company can impossibly assume some imaginary "middle ground" between:

A) Defender of truth and evidence, freedom fighter, human rights activist, so that humanity can learn from mistakes and crimes

B) Status quomplicit opressor of evidence

Because any imaginary hypothetical "middle ground" entity would quickly be inundated by legal requests for companies hiding their suddenly permanently visible crimes, and simultaneously for reinstantiations by victims pleading public access to the evidence.

Once we know its either A or B, and recalling point "tech savvy" (point 1), we can summarily conclude that a class A archival entity would helpfully assist victims as follows: don't just provide easy page archival buttons, but also provide attestations: say zip files of the pages, with an autogenerated legalese PDF, with hashes of the page and the date of observation, cryptographically signed by the IA. This way a victim can prove to police, lawyers, judges, or in case those locally work against them, prove to friends, family, ... that the IA did in fact see the information and evidence.

I leave it to the reader to locate the attestation package zips for these pages, in order to ascertain that the IA is a class A organization, and not a class B one.

25. db48x ◴[21 Oct 24 10:37 UTC] No.41902680{12}[source]▶

>>41902059 #

I would be better to quote the actual text of the law than to make up a silly hypothetical on the spot, but that would be more work.

Even better would be to quote from some case where a judge has applied the law to actual events.

26. jpc0 ◴[21 Oct 24 12:08 UTC] No.41903264{11}[source]▶

>>41900358 #

I'm not going to look up legal precedent, hire a lawyer if you want that.

You are wrong, copyright specifically prohibits copying, not distribution. They can get a cease and desist that requests you destroy property and they ca get a court order backing that which will put you into contempt of court if you fail to do so.

Proving damages is easier with distribution, but that is a civil matter not a criminal matter.

replies(1): >>41903727 #

27. wl ◴[21 Oct 24 12:20 UTC] No.41903368{3}[source]▶

>>41896697 #

They ignore robots.txt these days.

replies(1): >>41909714 #

28. db48x ◴[21 Oct 24 13:03 UTC] No.41903727{12}[source]▶

>>41903264 #

They’re not making copies either.

replies(1): >>41909175 #

29. myself248 ◴[21 Oct 24 19:29 UTC] No.41907558{4}[source]▶

>>41896764 #

So, precisely the same constraints that IA operates under, just perma.cc isn't big enough yet to have been forced to comply with them?

I'll hold my breath.

30. jpc0 ◴[21 Oct 24 22:20 UTC] No.41909175{13}[source]▶

>>41903727 #

So they aren't making copies? How then do they have an archive of internet resources if not by copying said resources?

You do realise the "downloading" is implicitly a copy.

If you want to actually have a civil discussion then you need to make some reasonable argument than "They're not making copies either."

Sounds like whatever role you played at IA when you were there didn't give you any actual insight into what happens in operation and you simply tried to prove your point with an appeal to authority instead of backing it with facts and reason.

replies(1): >>41909595 #

31. db48x ◴[21 Oct 24 23:23 UTC] No.41909595{14}[source]▶

>>41909175 #

Ahem. We are discussing items which have been hidden. A hidden item which users of the archive cannot access is not being copied. It’s just sitting there on a drive. Occasionally an automated process comes along and computes its hash to make sure there hasn’t been any bitrot. There’s no warehouse of undistributed copies of the thing that a court can order the Archive to destroy.

replies(1): >>41912502 #

32. db48x ◴[21 Oct 24 23:31 UTC] No.41909636{6}[source]▶

>>41901967 #

Yea, it’s pretty weird. There’s no technical reason for it, merely a legal one.

33. db48x ◴[21 Oct 24 23:42 UTC] No.41909714{4}[source]▶

>>41903368 #

Hmm. I’ll have to try to remember that :)

34. jpc0 ◴[22 Oct 24 09:06 UTC] No.41912502{15}[source]▶

>>41909595 #

How did it get to the drive other than having been copied there?

You are once again discussing distribution not copying.

Your distinction might apply to an individual holding one or two copies of some copyrighted material because it isn't worth the legal hassle to go after them.

For someone like the IA holding terabytes, by your claims, of copyrighted material. "Big copyright" absolutely will go after them for that if it is true and it would sink the IA in legal fees.

There's nuance in all things, why I said if you actually care hire a lawyer just like they would. But your comment of "copying is fine, if you don't distribute" only applies where fair use applies which means if I hold terabytes of copies I am not legally allowed to have made, I'm probably going to be spending a large portion of my life repaying that decision, just like any other person would, should I get caught.