←back to thread

492 points vladyslavfox | 3 comments | | HN request time: 0s | source
Show context
myself248 ◴[] No.41896048[source]
I'd like to imagine a world where every lawyer, when their case is helped by a Wayback Machine snapshot of something, flips a few bucks to IA. They could afford a world-class admin team in no time flat.
replies(2): >>41896197 #>>41897663 #
thaumasiotes ◴[] No.41896197[source]
That's a terrible solution. The Wayback Machine takes down their snapshots at the request of whoever controls the domain. That's not archival.

If the state of a webpage in the past matters to you, you need a record that won't cease to exist when your opposition asks it to. This is the concept behind perma.cc.

replies(3): >>41896261 #>>41896697 #>>41896848 #
db48x ◴[] No.41896697[source]
No, they don’t delete the archived content. When the domain’s robots.txt file bans spidering, then the Wayback Machine _hides_ the content archived at that domain. It is still stored and maintained, but it isn’t distributed via the website. The content will be unhidden if the robots.txt file stops banning spiders, or if an appropriate request is made.
replies(6): >>41896874 #>>41896927 #>>41896931 #>>41900009 #>>41902646 #>>41903368 #
null0pointer ◴[] No.41900009[source]
What’s the reasoning behind hiding content upon request? Doesn’t that defeat the purpose of archival?

My intuition would say there are 3 cases when content ceases to become available at the original site:

- The host becomes unable to host the content for some reason (bankruptcy, death, etc.) in which case I assume the archive persists.

- The host is externally required to remove the content (copyright, etc.) in which case I assume IA would face the same external pressure? But I’m not sure on that.

- The host/owner has a change of heart about publishing the content. This borders more on IA acting as reputation management on the part of the original host/owner. Personally I think this is hardest to defend but also probably the least common case. In this case I’d think it’s most often to hide something the original host doesn’t want the public finding out later, but that also seems to make it more valuable to be publicly available in the archive. Plus, from a historian/journalist perspective, it’s valuable to be able to track how things change over time, and hiding this from the public prevents that. Though to be honest I’m kind of in two minds here because on the other hand I’m generally of the opinion that people can grow and change, and we shouldn’t hold people to account for opinions they published a decade ago, for example. I’m also generally in favor of the right to be forgotten.

Would appreciate your thoughts here.

replies(1): >>41900308 #
1. db48x ◴[] No.41900308[source]
It’s all about copyright. Copyright law in the US gives a monopoly on distribution of copies of things (hand‐waving because the definitions are hard, basically artistic works) to their author. Of course authors usually delegate that right to their publisher for practical and financial reasons. There are some fair use exceptions, but this basically makes it illegal for anyone else to make and distribute copies of the author’s work. Again, hand‐waving because I don't want to have to write a dissertation.

When IA shows you what a website looked like in the past, they are reproducing a copyrighted work and distributing it to you. In some cases, perhaps many, this is fair use. IA cannot really know ahead of time which viewers would be exercising their fair use rights and which would not. Instead, IA just makes everything available without trying to guess whether the access would fall under fair use or not. That means that many times, possibly most of the time, IA is technically breaking the law by illegally distributing copies of copyrighted works.

But _owning_ a copy of a copyrighted work is never prohibited by copyright. It doesn’t matter how you got the copy either.

Therefore, pretty much any time someone asks for something to be hidden or removed on copyright grounds, they go ahead and hide it. They don’t bother to delete it though, because copyright doesn’t require them to. If a copyright holder asks for it to be deleted then they are overreaching, and should know that any sane person would object. But as far as I am aware IA doesn’t actually bother to object in writing; they just hide the content and move on.

This means that researchers can visit the archive in person and request permission to see those copies. For example if you are studying the history of artistic techniques in video games using emulated software on IA, you might eventually notice that all the games from one major publisher are missing (except iirc the original Donkey Kong, because they don’t actually own the copyright on that one). You could then journey to the Archive in person to see the missing material and fill in the gaps in your history. Or you could just ignore them entirely out of spite. This is no different than viewing rare books held by any library, or viewing unexhibited artifacts held by a museum, etc

replies(1): >>41901967 #
2. null0pointer ◴[] No.41901967[source]
Thanks for the detailed response, very informative. This sounds similar to DMCA takedown requests, though I’m not knowledgable enough to know the distinction. It’s a shame that to view hidden archives one needs to visit the archive in person, but I guess if IA were to respond to email requests for such archives they would be guilty of breaking the same distribution rule. The major difference between the rare books or museum examples and content on IA is that the digital artifacts are infinitely reproducible and transportable so the physical visit required to view them seems totally unnecessary on its face.

It’s a shame that to be able to run an above-board _Internet_ Archive one needs to bend to the whim of anachronistic copyright law and forego all the benefits of the internet in the first place. This seems like it would inevitably mean that any _internet_ archive that is truly accessible over the _internet_ would be forced to operate illegally in a similar manner to SciHub.

I know I hold a rather strong opinion regarding copyright law (I’m not looking to debate it here as I know others hold different opinions which is totally fine), but IMHO copyright law has been a major blight on humanity at large and especially the internet. Major reform is in order at the very least, if not total abolishment.

replies(1): >>41909636 #
3. db48x ◴[] No.41909636[source]
Yea, it’s pretty weird. There’s no technical reason for it, merely a legal one.