←back to thread

252 points lgats | 8 comments | | HN request time: 0.018s | source | bottom

I have been struggling with a bot– 'Mozilla/5.0 (compatible; crawler)' coming from AWS Singapore – and sending an absurd number of requests to a domain of mine, averaging over 700 requests/second for several months now. Thankfully, CloudFlare is able to handle the traffic with a simple WAF rule and 444 response to reduce the outbound traffic.

I've submitted several complaints to AWS to get this traffic to stop, their typical followup is: We have engaged with our customer, and based on this engagement have determined that the reported activity does not require further action from AWS at this time.

I've tried various 4XX responses to see if the bot will back off, I've tried 30X redirects (which it follows) to no avail.

The traffic is hitting numbers that require me to re-negotiate my contract with CloudFlare and is otherwise a nuisance when reviewing analytics/logs.

I've considered redirecting the entirety of the traffic to aws abuse report page, but at this scall, it's essentially a small DDoS network and sending it anywhere could be considered abuse in itself.

Are there others that have similar experience?

Show context
neya ◴[] No.45614249[source]
I had this issue on one of my personal sites. It was a blog I used to write maybe 7-8 years ago. All of a sudden, I see insane traffic spikes in analytics. I thought some article went viral, but realized it was too robotic to be true. And so I narrowed it down to some developer trying to test their bot/crawler on my site. I tried asking nicely, several times, over several months.

I was so pissed off that I setup a redirect rule for it to send them over to random porn sites. That actually stopped it.

replies(1): >>45614924 #
1. sim7c00 ◴[] No.45614924[source]
this is the best approach honestly. redirect them to some place that undermines their efforts. either back to themselves, their own provider, or nasty crap that no one want to find in their crawler logs.
replies(2): >>45615089 #>>45620298 #
2. throwaway422432 ◴[] No.45615089[source]
Goatse?

Wouldn't recommend Googling it. You either know or just take a guess.

replies(3): >>45619222 #>>45620485 #>>45621395 #
3. Rendello ◴[] No.45619222[source]
I googled a lot of shock sites after seeing them referenced and not knowing what they were. Luckily Google and Wikipedia tended to shield my innocent eyes while explaining what I should be seeing.

The first goatse I actually saw was in ASCII form, funnily enough.

replies(1): >>45622069 #
4. specialist ◴[] No.45620298[source]
Maybe someone will publish a "nastylist" for redirecting bots.

Decades later, I'm still traumatized by goatse, so it'll have to be someone with more fortitude than me.

replies(1): >>45625883 #
5. ◴[] No.45620485[source]
6. nosrepa ◴[] No.45621395[source]
The Jason Scott method.
7. antonymoose ◴[] No.45622069{3}[source]
I use the ASCII form to reply to spammers, since it will not trip up on an attachment filter or anything most usually. I get mixed results from them, but the results are usually funny.
8. sim7c00 ◴[] No.45625883[source]
goatse, lemonparty, meatspin. take ur pick of the gross but clearnetable things.

mind you before google and the likes and the great purge of internet, these things were mild and humorous...