←back to thread

597 points classichasclass | 1 comments | | HN request time: 0.233s | source
Show context
alphazard ◴[] No.45013420[source]
I'm always a little surprised to see how many people take robots.txt seriously on HN. It's nice to see so many folks with good intentions.

However, it's obviously not a real solution. It depends on people knowing about it, and adding the complexity of checking it to their crawler. Are there other more serious solutions? It seems like we've heard about "micropayments" and "a big merkle tree of real people" type solutions forever and they've never materialized.

replies(2): >>45013470 #>>45013819 #
ralferoo ◴[] No.45013470[source]
> It depends on people knowing about it, and adding the complexity of checking it to their crawler.

I can't believe any bot writer doesn't know about robots.txt. They're just so self-obsessed and can't comprehend why the rules should apply to them, because obviously their project is special and it's just everyone else's bot that causes trouble.

replies(2): >>45013686 #>>45017982 #
1. jabroni_salad ◴[] No.45017982[source]
I know crawlies are for sure reading robots.txt because they keep getting themselves banned by my disallowed /honeytrap page which is only advertised there.