←back to thread

646 points blendergeek | 1 comments | | HN request time: 0.23s | source
Show context
kerkeslager ◴[] No.42727510[source]
Question: do these bots not respect robots.txt?

I haven't added these scrapers to my robots.txt on the sites I work on yet because I haven't seen any problems. I would run something like this on my own websites, but I can't see selling my clients on running this on their websites.

The websites I run generally have a honeypot page which is linked in the headers and disallowed to everyone in the robots.txt, and if an IP visits that page, they get added to a blocklist which simply drops their connections without response for 24 hours.

replies(3): >>42727689 #>>42727693 #>>42727959 #
0xf00ff00f ◴[] No.42727959[source]
> The websites I run generally have a honeypot page which is linked in the headers and disallowed to everyone in the robots.txt, and if an IP visits that page, they get added to a blocklist which simply drops their connections without response for 24 hours.

I love this idea!

replies(1): >>42732436 #
1. griomnib ◴[] No.42732436[source]
Yeah, this is elegant as fuck.