The web does not need gatekeepers: Cloudflare’s new “signed agents” pitch

> But the reality is how can someone small protect their blog or content from AI training bots?

First off, there's no harm from well-behaved bots. Badly behaved bots that cause problems for the server are easily detected (by the problems they cause), classified, and blocked or heavily throttled.

Of course, if you mean "protect" in the sense of "keep AI companies from getting a copy" (which you may have, given that you mentioned training) - you simply can't, unless you consider "don't put it on the web" a solution.

It's impossible to make something "public, but not like that". Either you publish or you don't.

If anything, it's a legal issue (copyright/fair use), not a technical one. Technical solutions won't work.

I'm not sure why people are so confused by this. The Mastodon/AP userbase put their public content on a publicly federated protocol then lost their shit and sent me death threats when I spidered and indexed it for network-wide search.

There are upsides and downsides to publishing things you create. One of the downsides is that it will be public and accessible to everyone.