Maybe that's a way to defend against bots that ignore robots.txt, include a reference to a Honeypot HTML file with garbage text, but include the link to it in a comment.
Maybe that's a way to defend against bots that ignore robots.txt, include a reference to a Honeypot HTML file with garbage text, but include the link to it in a comment.
Currently we have at least three problems:
1) Companies have no issue with not providing sources and not linking back.
2) There are too many scrapers, even if they behaved, some site would struggle to handle all of them.
3) Srapers go full throttle 24/7, expecting the sites to rate-limit them if they are going to fast. Hammer a site into the ground, just wait until it's back and hammer it again, grabbing what you can before it crashes once more.
There's no longer a sense of the internet being for all of us and that we need to make room for each other. Website / human generated content exists as a resource to be strip mined.