←back to thread

770 points ta988 | 3 comments | | HN request time: 0.721s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
MetaWhirledPeas ◴[] No.42551742[source]
> Cloudflare also has a feature to block known AI bots and even suspected AI bots

In addition to other crushing internet risks, add wrongly blacklisted as a bot to the list.

replies(4): >>42551773 #>>42552921 #>>42562510 #>>42564887 #
1. JohnMakin ◴[] No.42552921[source]
These features are opt-in and often paid features. I struggle to see how this is a "crushing risk," although I don't doubt that sufficiently unskilled shops would be completely crushed by an IP/userAgent block. Since Cloudflare has a much more informed and broader view of internet traffic than maybe any other company in the world, I'll probably use that feature without any qualms at some point in the future. Right now their normal WAF rules do a pretty good job of not blocking legitimate traffic, at least on enterprise.
replies(1): >>42553817 #
2. MetaWhirledPeas ◴[] No.42553817[source]
The risk is not to the company using Cloudflare; the risk is to any legitimate individual who Cloudflare decides is a bot. Hopefully their detection is accurate because a false positive would cause great difficulties for the individual.
replies(1): >>42562004 #
3. neilv ◴[] No.42562004[source]
For months, my Firefox was locked out of gitlab.com and some other sites I wanted to use, because CloudFlare didn't like my browser.

Lesson learned: even when you contact the sales dept. of multiple companies, they just don't/can't care about random individuals.

Even if they did care, a company successfully doing an extended three-way back-and-forth troubleshooting with CloudFlare, over one random individual, seems unlikely.