←back to thread

252 points lgats | 4 comments | | HN request time: 0.802s | source

I have been struggling with a bot– 'Mozilla/5.0 (compatible; crawler)' coming from AWS Singapore – and sending an absurd number of requests to a domain of mine, averaging over 700 requests/second for several months now. Thankfully, CloudFlare is able to handle the traffic with a simple WAF rule and 444 response to reduce the outbound traffic.

I've submitted several complaints to AWS to get this traffic to stop, their typical followup is: We have engaged with our customer, and based on this engagement have determined that the reported activity does not require further action from AWS at this time.

I've tried various 4XX responses to see if the bot will back off, I've tried 30X redirects (which it follows) to no avail.

The traffic is hitting numbers that require me to re-negotiate my contract with CloudFlare and is otherwise a nuisance when reviewing analytics/logs.

I've considered redirecting the entirety of the traffic to aws abuse report page, but at this scall, it's essentially a small DDoS network and sending it anywhere could be considered abuse in itself.

Are there others that have similar experience?

1. pickle-wizard ◴[] No.45622128[source]
Do you have any legitimate traffic coming from AWS? My thought is to just drop all traffic from their ASN. Once they can't contact you for a while they'll move along and you could unblock.
replies(1): >>45623508 #
2. kijin ◴[] No.45623508[source]
If it's all from a single AWS region, this is the way to go.

I tend to be careful with residential or office IP ranges. But if it looks like a datacenter, it will be blocked, no second thoughts. Especially if it's a cloud provider that makes it too easy for customers to rotate IPs. Identify the ASN within which they're rotating their IPs, and block it. This is much more effective than blocking based on arbitrary CIDRs or geographical boundaries.

Unless you're running an API for developers, there's no legitimate (non-crawling) reason for someone to request your site from an AWS resource. Even less so for something like Huawei Cloud.

replies(1): >>45623794 #
3. mat_epice ◴[] No.45623794[source]
> there's no legitimate (non-crawling) reason for someone to request your site from an AWS resource

I used to run an X instance in the cloud that I would sometimes browse websites from. It sucked but it was also legitimate.

replies(1): >>45623880 #
4. kijin ◴[] No.45623880{3}[source]
"Legitimate" is relative here. I would count you as using unusual software to hide your actual source address. Not a huge concern because if you're doing that, I assume you also know how to move around to avoid getting blocked.

In fact, the ability to move to a different cloud on short notice is also part of the CAPTCHA, because large cloud-based botnets usually can't. They'd get instabanned if they tried to move their crawling boxes to something like DigitalOcean.