←back to thread

770 points ta988 | 3 comments | | HN request time: 0s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
devit[dead post] ◴[] No.42551513[source]
[flagged]
jsheard ◴[] No.42551599[source]
That's right, getting DDOSed is a skill issue. Just have infinite capacity.
replies(1): >>42551648 #
1. devit ◴[] No.42551648[source]
DDOS is different from crashing.

And I doubt Facebook implemented something that actually saturates the network, usually a scraper implements a limit on concurrent connections and often also a delay between connections (e.g. max 10 concurrent, 100ms delay).

Chances are the website operator implemented a webserver with terrible RAM efficiency that runs out of RAM and crashes after 10 concurrent requests, or that saturates the CPU from simple requests, or something like that.

replies(2): >>42551678 #>>42553042 #
2. adamtulinius ◴[] No.42551678[source]
You can doubt all you want, but none of us really know, so maybe you could consider interpreting people's posts a bit more generously in 2025.
3. atomt ◴[] No.42553042[source]
I've seen concurrency in excess of 500 from Metas crawlers to a single site. That site had just moved all their images so all the requests hit the "pretty url" rewrite into a slow dynamic request handler. It did not go very well.