←back to thread

255 points ColinWright | 3 comments | | HN request time: 0.503s | source
Show context
bakql ◴[] No.45775259[source]
>These were scrapers, and they were most likely trying to non-consensually collect content for training LLMs.

"Non-consensually", as if you had to ask for permission to perform a GET request to an open HTTP server.

Yes, I know about weev. That was a travesty.

replies(15): >>45775283 #>>45775392 #>>45775754 #>>45775912 #>>45775998 #>>45776008 #>>45776055 #>>45776210 #>>45776222 #>>45776270 #>>45776765 #>>45776932 #>>45777727 #>>45777934 #>>45778166 #
Calavar ◴[] No.45775392[source]
I agree. It always surprises me when people are indignant about scrapers ignoring robots.txt and throw around words like "theft" and "abuse."

robots.txt is a polite request to please not scrape these pages because it's probably not going to be productive. It was never meant to be a binding agreement, otherwise there would be a stricter protocol around it.

It's kind of like leaving a note for the deliveryman saying please don't leave packages on the porch. It's fine for low stakes situations, but if package security is of utmost importance to you, you should arrange to get it certified or to pick it up at the delivery center. Likewise if enforcing a rule of no scraping is of utmost importance you need to require an API token or some other form of authentication before you serve the pages.

replies(9): >>45775489 #>>45775674 #>>45776143 #>>45776484 #>>45776561 #>>45776927 #>>45777831 #>>45778192 #>>45779259 #
bigbuppo ◴[] No.45776143[source]
Seriously. Did you see what that web server was wearing? I mean, sure it said "don't touch me" and started screaming for help and blocked 99.9% of our IP space, but we got more and they didn't block that so clearly they weren't serious. They were asking for it. It's their fault. They're not really victims.
replies(1): >>45776208 #
jMyles ◴[] No.45776208[source]
Sexual consent is sacred. This metaphor is in truly bad taste.

When you return a response with a 200-series status code, you've granted consent. If you don't want to grant consent, change the logic of the server.

replies(5): >>45776310 #>>45776406 #>>45777226 #>>45778865 #>>45780805 #
Larrikin ◴[] No.45776406[source]
>I don't like how your metaphor is an effective metaphor for the situation so it's in bad taste.
replies(2): >>45776901 #>>45783417 #
1. jack_pp ◴[] No.45776901[source]
if u absolutely want a sexual metaphor it's more like you snuck into the world record for how many sexual parteners a woman can take in 24h and even tho you aren't on the list you still got to smash.

solution is the same, implement better security

replies(2): >>45779014 #>>45779050 #
2. ◴[] No.45779014[source]
3. bigbuppo ◴[] No.45779050[source]
Thank you for finding the right metaphor. If there is a sign out front that has a list of individuals that should go away but they continue, they're in a lot of legal trouble. If they show a fake ID to the event organizers that are handling all the paperwork, that is also something that will land them in prison.