←back to thread

252 points lgats | 1 comments | | HN request time: 0.2s | source

I have been struggling with a bot– 'Mozilla/5.0 (compatible; crawler)' coming from AWS Singapore – and sending an absurd number of requests to a domain of mine, averaging over 700 requests/second for several months now. Thankfully, CloudFlare is able to handle the traffic with a simple WAF rule and 444 response to reduce the outbound traffic.

I've submitted several complaints to AWS to get this traffic to stop, their typical followup is: We have engaged with our customer, and based on this engagement have determined that the reported activity does not require further action from AWS at this time.

I've tried various 4XX responses to see if the bot will back off, I've tried 30X redirects (which it follows) to no avail.

The traffic is hitting numbers that require me to re-negotiate my contract with CloudFlare and is otherwise a nuisance when reviewing analytics/logs.

I've considered redirecting the entirety of the traffic to aws abuse report page, but at this scall, it's essentially a small DDoS network and sending it anywhere could be considered abuse in itself.

Are there others that have similar experience?

Show context
locusm ◴[] No.45615029[source]
I am dealing with a similar situation and kinda screwed up as I managed to get Google Ads suspended due to blocking Singapore. I see a mix of traffic from AWS, Tencent and Huawei cloud at the moment. Currently Im just scanning server logs and blocking ip ranges.
replies(1): >>45619508 #
crazygringo ◴[] No.45619508[source]
> I managed to get Google Ads suspended due to blocking Singapore

How did that happen, why? I feel like a lot of people here would not want to make the same mistake, so details would be very welcome.

As long as pages weren't being served and so there was never any case of requesting ads but never showing them, I don't understand why Ads would care?

replies(1): >>45623602 #
1. kijin ◴[] No.45623602[source]
Not the parent, but it sounds like they blocked the entire country, including Googlebot's Singaporean IP ranges.

If your server returns different content when Google crawls it compared to when normal users visit, they might suspect that you are trying to game the system. And yes, they do check from multiple locations with non-Googlebot user agents.

I'm not sure if showing an error page also counts as returning different content, but I guess the problem could be exacerbated by any content you include in the error page unless you're careful with the response code. Definitely don't make it too friendly. Whitelist important business partners.