AI scrapers request commented scripts

1. ang_cire ◴[31 Oct 25 21:29 UTC] No.45776912[source]▶

Yep. Robots.txt is a framework intended for performance, not a legal or ethical imperative.

If you want to control how someone accesses something, the onus is on you to put access controls in place.

The people who put things on a public, un-restricted server and then complain that the public accessed it in an un-restricted way might be excusable if it's some geocities-esque Mom and Pop site that has no reason to know better, but 'cryptography dog' ain't that.

replies(2): >>45780141 #>>45780987 #

2. 1gn15 ◴[01 Nov 25 00:41 UTC] No.45778254[source]▶

>>45776797 (TP) #

Thank you <3

3. ordu ◴[01 Nov 25 08:42 UTC] No.45780141[source]▶

>>45776912 #

It is an asocial state of mind. We have locks and security systems that prevent people from stealing. But if all people agreed to not steal, then we could save that efforts for something better. The ideal approach doesn't work with the stealing, and now it doesn't work with HTTP either. It just raises costs for a society with no lasting benefit for anyone: site owners just figure out ways to restrict access and no more scraping of pages that they do not want to be scraped.

A healthy society relies on a cooperation between members. It relies on them accepting some rules that limits their behavior. Like we agreed not to kill others, and now I can go outside without weapons and anti-bullet defenses.

4. Anamon ◴[01 Nov 25 12:02 UTC] No.45780987[source]▶

>>45776912 #

What controls do you suggest?

Saying that a handful of mass copyright infringers with billion dollar investors are simply part of the "public" like every regular visitor is seriously distorting the issue here.

Sites with a robots.txt banning bots are only "unrestricted" in a strictly technical sense. They are clearly setting terms of use that these rogue bots are violating. Besides, robots.txt is legally binding in certain jurisdictions, it's not just a polite plea. And if we decide that anything not technically prevented is legal, then we're also legitimising botnets, DDoS attacks, and a lot more. Hacking into a corporate system through a malconfiguration or vulnerability is also illegal, despite the fact that the defenses failed.

Finally, we all know that the only purpose these bots are scraping for is mass copyright infringement. That's another layer where the "if it's accessible, it's fair game" logic falls apart. I can download a lot of publicly accessible art, music, or software, but that doesn't mean I can do with those files whatever I want. The only reason these AI companies haven't been sued out of existence yet, like they should've been, is that it's trickier to prove provenance than if they straight up served the unmodified files.