←back to thread

454 points positiveblue | 5 comments | | HN request time: 0.451s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
1. rsync ◴[] No.45068077[source]
“But the reality is how can someone small protect their blog or content from AI training bots?”

Why would you need to?

If your inability to assemble basic HTML forces you to adopt enormous, bloated frameworks that require two full cores of a cpu to render your post…

… or if you think your online missives are a step in the road to content creator riches …

… then I suppose I see the problem.

Otherwise there’s no problem.

replies(3): >>45068251 #>>45068511 #>>45072684 #
2. lovich ◴[] No.45068251[source]
So by a free and open for all web you mean only for the tech priests competent enough to build the skills and maintain them in light of changes to the spec(hope these people didn’t run across xml/xslt dependent techniques building their site), or have a rich enough family that you can casually learn a skill while not worry about putting food on the table?

There’s going to be bad actors taking advantage of people who cannot fight back without regulations and gatekeepers, suggesting otherwise is about as reasonable as ancaps idea of government

3. nc0 ◴[] No.45068511[source]
It's not a question of languages or frameworks, but hardware. I cannot finance servers large enough to keep up with AI bots constantly scrapping my host, bypassing cache indications, or changing IP to avoid bans.
4. jeroenhd ◴[] No.45072684[source]
I have had to disable at least one service because AI bots kept hitting it and it started impacting other stuff I was running that I am more interested in. Part of it was the CPU load on the database rendering dozens of 404s per second (which still required a database call), part of it was that the thumbnail images were being queried over and over again with seemingly different parameters for no reason.

I'm sure there are AI bots that are good and respect the websites they operate on. Most of them don't seem to, and I don't care enough about the AI bubble to support them.

When AI companies stop people from using them as cheap scrapers, I'll rethink my position. So far, there's no way to distinguish any good AI bot from a bad one.

replies(1): >>45091347 #
5. account42 ◴[] No.45091347[source]
> Part of it was the CPU load on the database rendering dozens of 404s per second (which still required a database call)

That's one request every 80 ms which is an eternity in CPU time. How the hell can you not afford to check that something doesn't exist every 80 ms.

> part of it was that the thumbnail images were being queried over and over again with seemingly different parameters for no reason.

Is there a reason you are serving thumbnails for arbitrary parameters?