←back to thread

255 points ColinWright | 1 comments | | HN request time: 0.239s | source
Show context
bakql ◴[] No.45775259[source]
>These were scrapers, and they were most likely trying to non-consensually collect content for training LLMs.

"Non-consensually", as if you had to ask for permission to perform a GET request to an open HTTP server.

Yes, I know about weev. That was a travesty.

replies(15): >>45775283 #>>45775392 #>>45775754 #>>45775912 #>>45775998 #>>45776008 #>>45776055 #>>45776210 #>>45776222 #>>45776270 #>>45776765 #>>45776932 #>>45777727 #>>45777934 #>>45778166 #
grayhatter ◴[] No.45777727[source]
If you're lying in the requests you send, to trick my server into returning the content you want, instead of what I would want to return to webscrapers, that's non-consensual.

You don't need my permission to send a GET request, I completely agree. In fact, by having a publicly accessible webserver, there's implied consent that I'm willing to accept reasonable, and valid GET requests.

But I have configured my server to spend server resources the way I want, you don't like how my server works, so your configure your bot to lie. If you get what you want only because you're willing to lie, where's the implied consent?

replies(2): >>45778691 #>>45780948 #
batch12 ◴[] No.45778691[source]
Browser user agents have a history of being lies from the earliest days of usage. Official browsers lied about what they were- and still do.
replies(2): >>45779898 #>>45782316 #
1. grayhatter ◴[] No.45782316[source]
Can you give a single example of a browser with a user agent that lies about it's real origin?

The best I can come up with is the TOR browser, which will reduce the number of bits of information it will return, but I dont consider that to be misleading. It's a custom build of firefox, that discloses it is firefox, and otherwise behaves exactly as I would expect firefox to behave.