←back to thread

770 points ta988 | 5 comments | | HN request time: 0.001s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
CoastalCoder ◴[] No.42551410[source]
I wonder if it would work to send Meta's legal department a notice that they are not permitted to access your website.

Would that make subsequent accesses be violations of the U.S.'s Computer Fraud and Abuse Act?

replies(3): >>42551475 #>>42551557 #>>42551847 #
1. jahewson ◴[] No.42551557[source]
No, fortunately random hosts on the internet don’t get to write a letter and make something a crime.
replies(1): >>42551751 #
2. throwaway_fai ◴[] No.42551751[source]
Unless they're a big company in which case they can DMCA anything they want, and they get the benefit of the doubt.
replies(1): >>42551842 #
3. BehindBlueEyes ◴[] No.42551842[source]
Can you even DMCS takedown crawlers?
replies(1): >>42551866 #
4. throwaway_fai ◴[] No.42551866{3}[source]
Doubt it, a vanilla cease-and-desist letter would probably be the approach there. I doubt any large AI company would pay attention though, since, even if they're in the wrong, they can outspend almost anyone in court.
replies(1): >>42570135 #
5. Nevermark ◴[] No.42570135{4}[source]
Small claims court?