←back to thread

770 points ta988 | 3 comments | | HN request time: 0s | source
Show context
markerz ◴[] No.42551173[source]
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...

It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

replies(14): >>42551260 #>>42551410 #>>42551412 #>>42551513 #>>42551649 #>>42551742 #>>42552017 #>>42552046 #>>42552437 #>>42552763 #>>42555123 #>>42562686 #>>42565119 #>>42572754 #
bodantogat ◴[] No.42551649[source]
I see a lot of traffic I can tell are bots based on the URL patterns they access. They do not include the "bot" user agent, and often use residential IP pools. I haven't found an easy way to block them. They nearly took out my site a few days ago too.
replies(5): >>42551680 #>>42551803 #>>42556117 #>>42558781 #>>42574346 #
echelon ◴[] No.42551803[source]
You could run all of your content through an LLM to create a twisted and purposely factually incorrect rendition of your data. Forward all AI bots to the junk copy.

Everyone should start doing this. Once the AI companies engorge themselves on enough garbage and start to see a negative impact to their own products, they'll stop running up your traffic bills.

Maybe you don't even need a full LLM. Just a simple transformer that inverts negative and positive statements, changes nouns such as locations, and subtly nudges the content into an erroneous state.

replies(5): >>42551837 #>>42551968 #>>42552052 #>>42553499 #>>42553755 #
1. tivert ◴[] No.42553755[source]
> You could run all of your content through an LLM to create a twisted and purposely factually incorrect rendition of your data. Forward all AI bots to the junk copy.

> Everyone should start doing this. Once the AI companies engorge themselves on enough garbage and start to see a negative impact to their own products, they'll stop running up your traffic bills.

I agree, and not just to discourage them running up traffic bills. The end-state of what they hope to build is very likely to be extremely for most regular people [1], so we shouldn't cooperate in building it.

[1] And I mean end state. I don't care how much value you say you get from some AI coding assistant today, the end state is your employer happily gets to fire you and replace you with an evolved version of the assistant at a fraction of your salary. The goal is to eliminate the cost that is our livelihoods. And if we're lucky, in exchange we'll get a much reduced basic income sufficient to count the rest of our days from a dense housing project filled with cheap minimum-quality goods and a machine to talk to if we're sad.

replies(1): >>42577136 #
2. danlugo92 ◴[] No.42577136[source]
If your employer can run their companies without employees in the future it also means you can have your own company with no employees.

If anything this will level the playing field, and creativity will prevail.

replies(1): >>42578229 #
3. tivert ◴[] No.42578229[source]
> If your employer can run their companies without employees in the future it also means you can have your own company with no employees.

No, you still need money. Lots of money.

> If anything this will level the playing field, and creativity will prevail.

That's a fantasy. The people that already have money will prevail (for the most part).