←back to thread

770 points ta988 | 1 comments | | HN request time: 0s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
MetaWhirledPeas ◴[] No.42551742[source]
> Cloudflare also has a feature to block known AI bots and even suspected AI bots

In addition to other crushing internet risks, add wrongly blacklisted as a bot to the list.

replies(4): >>42551773 #>>42552921 #>>42562510 #>>42564887 #
throwaway290 ◴[] No.42551773[source]
What do you mean crushing risk? Just solve these 12 puzzles by moving tiny icons on tiny canvas while on the phone and you are in the clear for a couple more hours!
replies(3): >>42552006 #>>42552586 #>>42552825 #
benhurmarcel ◴[] No.42552825[source]
Sometimes it doesn’t even give you a Captcha.

I have come across some websites that block me using Cloudflare with no way of solving it. I’m not sure why, I’m in a large first-world country, I tried a stock iPhone and a stock Windows PC, no VPN or anything.

That’s just no way to know.

replies(2): >>42555004 #>>42570541 #
dannyw ◴[] No.42555004[source]
That’s probably a page/site rule set by the website owner. Some sites block EU IPs as the costs of complying with GDPR outweigh the gain.
replies(2): >>42556915 #>>42557953 #
1. benhurmarcel ◴[] No.42557953{3}[source]
One of the affected websites is a local cafe in the EU. It doesn’t make any sense to block EU IPs.