←back to thread

597 points classichasclass | 1 comments | | HN request time: 0.282s | source
Show context
firefoxd ◴[] No.45010698[source]
Since I posted an article here about using zip bombs [0], I'm flooded with bots. I'm constantly monitoring and tweaking my abuse detector, but this particular bot mentioned in the article seemed to be pointing to an RSS reader. I white listed it at first. But now that I gave it a second look, it's one of the most rampant bot on my blog.

[0]: https://news.ycombinator.com/item?id=43826798

replies(3): >>45011175 #>>45011249 #>>45012142 #
1. JdeBP ◴[] No.45012142[source]
One of the few manual deny-list entries that I have made was not for a Chinese company, but for the ASes of the U.S.A. subsidiary of a Chinese company. It just kept coming back again and again, quite rapidly, for a particular page that was 404. Not for any other related pages, mind. Not for the favicon, robots.txt, or even the enclosing pseudo-directory. Just that 1 page. Over and over.

The directory structure had changed, and the page is now 1 level lower in the tree, correctly hyperlinked long since, in various sitemaps long since, and long since discovered by genuine HTTP clients.

The URL? It now only exists in 1 place on the WWW according to Google. It was posted to Hacker News back in 2017.

(My educated guess is that I am suffering from the page-preloading fallout from repeated robotic scraping of old Hacker News stuff by said U.S.A. subsidiary.)