"Non-consensually", as if you had to ask for permission to perform a GET request to an open HTTP server.
Yes, I know about weev. That was a travesty.
"Non-consensually", as if you had to ask for permission to perform a GET request to an open HTTP server.
Yes, I know about weev. That was a travesty.
robots.txt is a polite request to please not scrape these pages because it's probably not going to be productive. It was never meant to be a binding agreement, otherwise there would be a stricter protocol around it.
It's kind of like leaving a note for the deliveryman saying please don't leave packages on the porch. It's fine for low stakes situations, but if package security is of utmost importance to you, you should arrange to get it certified or to pick it up at the delivery center. Likewise if enforcing a rule of no scraping is of utmost importance you need to require an API token or some other form of authentication before you serve the pages.
You are free to say "well, there is no mechanism to do that", and I would agree with you. That's the problem!
I suppose the ultimate solution would be browsers and operating systems and hardware manufacturers co-operating to implement some system that somehow cryptographically signs HTTP requests which attests that it was triggered by an actual, physical interaction with a computing device by a human.
Though you don't have to think for very long to come up with all kinds of collateral damage that would cause and how bad actors could circumvent it anyway.
All in all, this whole issue seems more like a legal problem than a technical one.
He just doesn't want tools humans use to access content to be used in association with his content.
What he failed to realize is that if you eliminate the tools, the human cannot access the content anyway. They don't have the proper biological interfaces. Had he realized that, he'd have come to notice that simply turning off his server fully satisfies the constraints.