←back to thread

770 points ta988 | 3 comments | | HN request time: 0.001s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
coldpie ◴[] No.42551412[source]
Imagine being one of the monsters who works at Facebook and thinking you're not one of the evil ones.
replies(3): >>42551437 #>>42551684 #>>42551823 #
1. Aeolun ◴[] No.42551823[source]
Well, Facebook actually releases their models instead of seeking rent off them, so I’m sort of inclined to say Facebook is one of the less evil ones.
replies(1): >>42551949 #
2. echelon ◴[] No.42551949[source]
> releases their models

Some of them, and initially only by accident. And without the ingredients to create your own.

Meta is trying to kill OpenAI and any new FAANG contenders. They'll commoditize their complement until the earth is thoroughly salted, and emerge as one of the leading players in the space due to their data, talent, and platform incumbency.

They're one of the distribution networks for AI, so they're going to win even by just treading water.

I'm glad Meta is releasing models, but don't ascribe their position as one entirely motivated by good will. They want to win.

replies(1): >>42563387 #
3. int_19h ◴[] No.42563387[source]
FWIW, there's considerable doubt that the initial LLaMA "leak" was accidental, based on Meta's subsequent reaction.

I mean, the comment with a direct download link in their GitHub repo stayed up even despite all the visibility (it had tons of upvotes).