←back to thread

454 points positiveblue | 2 comments | | HN request time: 0.505s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
wvenable ◴[] No.45067955[source]
> Everyone loves the dream of a free for all and open web... But the reality is how can someone small protect their blog or content from AI training bots?

Aren't these statements entirely in conflict? You either have a free for all open web or you don't. Blocking AI training bots is not free and open for all.

replies(8): >>45067998 #>>45068139 #>>45068376 #>>45068589 #>>45068929 #>>45069170 #>>45073712 #>>45074969 #
BrenBarn ◴[] No.45067998[source]
I think that was the point. Everyone loves the dream, but the reality is different.
replies(1): >>45068015 #
wilson090 ◴[] No.45068015[source]
How so? If you don't want AI bots reading information on the web, you don't actually want a free and open web. The reality of an open web is that such information is free and available for anyone.
replies(6): >>45068058 #>>45068155 #>>45068305 #>>45068547 #>>45068621 #>>45068828 #
gradstudent ◴[] No.45068058[source]
How is it available for everyone if the AI bots bring down your server?
replies(5): >>45068142 #>>45068202 #>>45068241 #>>45068453 #>>45068709 #
1. mikestorrent ◴[] No.45068202[source]
Ultimately, you have to realize that this is a losing battle, unless we have completely draconian control over every piece of silicon. Captchas are being defeated; at this point they're basically just mechanisms to prove you Really Want to Make That Request to the extent that you'll spend some compute time on it, which is starting to become a bit of a waste of electricity and carbon.

Talented people that want to scrape or bot things are going to find ways to make that look human. If that comes in the form of tricking a physical iPhone by automatically driving the screen physically, so be it; many such cases already!

The techniques you need for preventing DDoS don't need to really differentiate that much between bots and people unless you're being distinctly targeted; Fail2Ban-style IP bans are still quite effective, and basic WAF functionality does a lot.

replies(1): >>45091177 #
2. account42 ◴[] No.45091177[source]
Agreed, copyright issues need to be solved via legislation and network abuse issues need to be solved by network operators. Trying to run around either only makes the web worse for everyone.