←back to thread

454 points positiveblue | 1 comments | | HN request time: 0s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
rsync ◴[] No.45068077[source]
“But the reality is how can someone small protect their blog or content from AI training bots?”

Why would you need to?

If your inability to assemble basic HTML forces you to adopt enormous, bloated frameworks that require two full cores of a cpu to render your post…

… or if you think your online missives are a step in the road to content creator riches …

… then I suppose I see the problem.

Otherwise there’s no problem.

replies(3): >>45068251 #>>45068511 #>>45072684 #
1. nc0 ◴[] No.45068511[source]
It's not a question of languages or frameworks, but hardware. I cannot finance servers large enough to keep up with AI bots constantly scrapping my host, bypassing cache indications, or changing IP to avoid bans.