←back to thread

770 points ta988 | 3 comments | | HN request time: 0s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
jandrese ◴[] No.42551260[source]
If a bot ignores robots.txt that's a paddlin'. Right to the blacklist.
replies(2): >>42551339 #>>42551721 #
nabla9 ◴[] No.42551721[source]
The linked article explains what happens when you block their IP.
replies(1): >>42551923 #
1. gs17 ◴[] No.42551923[source]
For reference:

> If you try to rate-limit them, they’ll just switch to other IPs all the time. If you try to block them by User Agent string, they’ll just switch to a non-bot UA string (no, really).

It's really absurd that they seem to think this is acceptable.

replies(2): >>42552960 #>>42554822 #
2. viraptor ◴[] No.42552960[source]
Block the whole ASN in that case.
3. therealdrag0 ◴[] No.42554822[source]
What about adding fake sleeps?