←back to thread

211 points CrankyBear | 1 comments | | HN request time: 0.211s | source
Show context
krunck ◴[] No.45106065[source]
I just block them by User Agent string[1]. The rest that fake the UA get clobbered by rate limiting[2] on the web server. Not perfect, but our site is not getting hammered any more.

[1] https://perishablepress.com/ultimate-ai-block-list/

[2] https://github.com/jzdziarski/mod_evasive

replies(1): >>45106323 #
braden_e ◴[] No.45106323[source]
There is a very large scale crawler that uses random valid user agents and a staggeringly large pool of ips. I first noticed it because a lot of traffic was coming from Brazil and "HostRoyale" (asn 203020). They send only a few requests a day from each ip so rate limiting is not useful.

I run a honeypot that generates urls with the source IP so I am pretty confident it is all one bot, in the past 48 hours I have had over 200,000 ips hit the honeypot.

I am pretty sure this is Bytedance, they occasionally hit these tagged honeypot urls with their normal user agent and their usual .sg datacenter.

replies(3): >>45106389 #>>45107318 #>>45107468 #
1. candlemas ◴[] No.45107318[source]
My site has also recently been getting massively hit by Brazilian IPs. It lasts for a day or two, even if they are being blocked.