←back to thread

770 points ta988 | 2 comments | | HN request time: 0.001s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
CoastalCoder ◴[] No.42551410[source]
I wonder if it would work to send Meta's legal department a notice that they are not permitted to access your website.

Would that make subsequent accesses be violations of the U.S.'s Computer Fraud and Abuse Act?

replies(3): >>42551475 #>>42551557 #>>42551847 #
betaby ◴[] No.42551475[source]
Crashing wasn't the intent. And scraping is legal, as I remember per Linkedin case.
replies(3): >>42551556 #>>42551790 #>>42551791 #
azemetre ◴[] No.42551556[source]
There’s a fine line between scrapping and DDOS’ing I’m sure.

Just because you manufacture chemicals doesn’t mean you can legally dump your toxic waste anywhere you want (well shouldn’t be allowed to at least).

You also shouldn’t be able to set your crawlers causing sites to fail.

replies(2): >>42551594 #>>42576313 #
acedTrex ◴[] No.42551594{3}[source]
intent is likely very important to something like a ddos charge
replies(4): >>42551704 #>>42551735 #>>42551816 #>>42551902 #
1. RF_Savage ◴[] No.42551816{4}[source]
So have the stressor and stress testing DDoS for hire sites changed to scraping yet?
replies(1): >>42559369 #
2. acedTrex ◴[] No.42559369[source]
The courts will likely be able to discern between "good faith" scraping and a DDoS for hire masquerading as scraping.