←back to thread

257 points ColinWright | 6 comments | | HN request time: 0s | source | bottom
Show context
Noumenon72 ◴[] No.45774469[source]
It doesn't seem that abusive. I don't comment things out thinking "this will keep robots from reading this".
replies(2): >>45774493 #>>45774628 #
1. mostlysimilar ◴[] No.45774628[source]
The article mentions using this as a means of detecting bots, not as a complaint that it's abusive.

EDIT: I was chastised, here's the original text of my comment: Did you read the article or just the title? They aren't claiming it's abusive. They're saying it's a viable signal to detect and ban bots.

replies(3): >>45774645 #>>45774743 #>>45776844 #
2. pseudalopex ◴[] No.45774645[source]
Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".[1]

[1] https://news.ycombinator.com/newsguidelines.html

3. woodrowbarlow ◴[] No.45774743[source]
the first few words of the article are:

> Last Sunday I discovered some abusive bot behaviour [...]

replies(2): >>45774770 #>>45774783 #
4. mostlysimilar ◴[] No.45774770[source]
> The robots.txt for the site in question forbids all crawlers, so they were either failing to check the policies expressed in that file, or ignoring them if they had.
5. foobarbecue ◴[] No.45774783[source]
Yeah but the abusive behavior is ignoring robots.txt and scraping to train AI. Following commented URLs was not the crime, just evidence inadvertently left behind.
6. ang_cire ◴[] No.45776844[source]
They call the scrapers "malicious", so they are definitely complaining about them.

> A few of these came from user-agents that were obviously malicious:

(I love the idea that they consider any python or go request to be a malicious scraper...)