←back to thread

770 points ta988 | 3 comments | | HN request time: 0s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
bodantogat ◴[] No.42551649[source]
I see a lot of traffic I can tell are bots based on the URL patterns they access. They do not include the "bot" user agent, and often use residential IP pools. I haven't found an easy way to block them. They nearly took out my site a few days ago too.
replies(5): >>42551680 #>>42551803 #>>42556117 #>>42558781 #>>42574346 #
newsclues ◴[] No.42551680[source]
The amateurs at home are going to give the big companies what they want: an excuse for government regulation.
replies(2): >>42551716 #>>42563374 #
throwaway290 ◴[] No.42551716[source]
If it doesn't say it's a bot and it doesn't come from a corporate IP it doesn't mean it's NOT a bot and not run by some "AI" company.
replies(1): >>42552086 #
1. bodantogat ◴[] No.42552086[source]
I have no way to verify this, I suspect these are either stealth AI companies or data collectors, who hope to sell training data to them
replies(1): >>42552538 #
2. datadrivenangel ◴[] No.42552538[source]
I've heard that some mobile SDKs / Apps earn extra revenue by providing an IP address for VPN connections / scraping.
replies(1): >>42573972 #
3. odo1242 ◴[] No.42573972[source]
Chrome extensions too