AI companies cause most of traffic on forums

(pod.geraspora.de)

770 points ta988 | 1 comments | 30 Dec 24 14:37 UTC | HN request time: 0s | source

Show context

markerz ◴[30 Dec 24 17:07 UTC] No.42551173[source]▶

One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #

bodantogat ◴[30 Dec 24 17:50 UTC] No.42551649[source]▶

>>42551173 #

I see a lot of traffic I can tell are bots based on the URL patterns they access. They do not include the "bot" user agent, and often use residential IP pools. I haven't found an easy way to block them. They nearly took out my site a few days ago too.

replies(5): >>42551680 #>>42551803 #>>42556117 #>>42558781 #>>42574346 #

petre ◴[02 Jan 25 13:46 UTC] No.42574346[source]▶

>>42551649 #

You rate limit them and then block the abusers. Nginx allows rate limiting. You can then block them using fail2ban for an hour if they're rate limited 3 times. If they get blocked 5 times you can block them forever using the recidive jail.

I've had massive AI bot traffic from M$, blocked several IPs by adding manual entries into the recidive jail. If they come back and disregard robots.txt with disallow * I will run 'em through fail2ban.

replies(1): >>42576295 #

herbst ◴[02 Jan 25 17:14 UTC] No.42576295[source]▶

>>42574346 #

Whatever M$ was doing still baffles me. I still have several azure ranges in my blocklist because whatever this was appeared to change strategie once I implemented a ban method.

replies(1): >>42578388 #

1. petre ◴[02 Jan 25 20:20 UTC] No.42578388[source]▶

>>42576295 #

They were hammering our closed ticketing system for some reason. I blocked an entire C block and an individual IP. If needed I will not hesitate banning all their ranges, which means we won't get any mail from Azure, M$ office 365, since this is also our mail server. But scew'em, I'll do it anyway until someone notices, since it's clearly abuse.

↑