←back to thread

Cloudflare.com's Robots.txt

(www.cloudflare.com)
145 points sans_souse | 2 comments | | HN request time: 0.401s | source
Show context
ck2 ◴[] No.42165342[source]
easy guess that length breaks some legacy stuff

but every robots.txt should have a auto-ban trap line

ie. crawl it and die

basically a script that puts the requesting IP into firewall

of course it's possible to abuse that so it has to be monitored

replies(2): >>42165349 #>>42166539 #
1. okdood64 ◴[] No.42165349[source]
How do you discern a crawler agent and a human? Is it easily as the fact that they might cover something like 80%+ of the site in one visit fairly quickly?
replies(1): >>42165697 #
2. SoftTalker ◴[] No.42165697[source]
Crawlers/archivers will be hitting your site much faster than a human user.