←back to thread

454 points positiveblue | 1 comments | | HN request time: 0.271s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
wvenable ◴[] No.45067955[source]
> Everyone loves the dream of a free for all and open web... But the reality is how can someone small protect their blog or content from AI training bots?

Aren't these statements entirely in conflict? You either have a free for all open web or you don't. Blocking AI training bots is not free and open for all.

replies(8): >>45067998 #>>45068139 #>>45068376 #>>45068589 #>>45068929 #>>45069170 #>>45073712 #>>45074969 #
BrenBarn ◴[] No.45067998[source]
I think that was the point. Everyone loves the dream, but the reality is different.
replies(1): >>45068015 #
wilson090 ◴[] No.45068015[source]
How so? If you don't want AI bots reading information on the web, you don't actually want a free and open web. The reality of an open web is that such information is free and available for anyone.
replies(6): >>45068058 #>>45068155 #>>45068305 #>>45068547 #>>45068621 #>>45068828 #
gradstudent ◴[] No.45068058[source]
How is it available for everyone if the AI bots bring down your server?
replies(5): >>45068142 #>>45068202 #>>45068241 #>>45068453 #>>45068709 #
ForHackernews ◴[] No.45068453[source]
Rate-limits? Use a CDN? Lots of traffic can be a problem whether it's bots or humans.
replies(1): >>45069881 #
1. danudey ◴[] No.45069881[source]
You realize this entire thread is about a pitch from a CDN company trying to solve an issue that has presented itself at such a scale that this is the best option they can think of to keep the web alive, right?

"Use a CDN" is not sufficient when these bots are so incredibly poorly behaved, because you're still paying for that CDN and this bad behavior is going to cost you a fortune in CDN costs (or cost the CDN a fortune instead, which is why Cloudflare is suggesting this).