←back to thread

454 points positiveblue | 10 comments | | HN request time: 0.604s | source | bottom
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
wvenable ◴[] No.45067955[source]
> Everyone loves the dream of a free for all and open web... But the reality is how can someone small protect their blog or content from AI training bots?

Aren't these statements entirely in conflict? You either have a free for all open web or you don't. Blocking AI training bots is not free and open for all.

replies(8): >>45067998 #>>45068139 #>>45068376 #>>45068589 #>>45068929 #>>45069170 #>>45073712 #>>45074969 #
BrenBarn ◴[] No.45067998[source]
I think that was the point. Everyone loves the dream, but the reality is different.
replies(1): >>45068015 #
wilson090 ◴[] No.45068015[source]
How so? If you don't want AI bots reading information on the web, you don't actually want a free and open web. The reality of an open web is that such information is free and available for anyone.
replies(6): >>45068058 #>>45068155 #>>45068305 #>>45068547 #>>45068621 #>>45068828 #
1. pton_xd ◴[] No.45068305[source]
> If you don't want AI bots reading information on the web, you don't actually want a free and open web.

This is such a bad faith argument.

We want a town center for the whole community to enjoy! What, you don't like those people shooting up drugs over there? But they're enjoying it too, this is what you wanted right? They're not harming you by doing their drugs. Everyone is enjoying it!

replies(4): >>45068568 #>>45068731 #>>45071476 #>>45071478 #
2. Loughla ◴[] No.45068568[source]
Set aside that there's a pretty big difference between AI scraping and illegal drug usage.

If the person using illegal drugs is on no way harming anyone but themselves and not being a nuisance, then yeah, I can get behind that. Put whatever you want in your body, just don't let it negatively impact anyone around you. Seems reasonable?

replies(2): >>45069179 #>>45071513 #
3. wvenable ◴[] No.45068731[source]
If an AI bot is accessing my site the way that regular users are accessing my site -- in other words everyone is using the town center as intended -- what is the problem?

Seems to be a lot of conflating of badly coded (intentionally or not) scrapers and AI. That is a problem that predates AI's existence.

replies(1): >>45073366 #
4. pavel_lishin ◴[] No.45069179[source]
> just don't let it negatively impact anyone around you.

Exactly! Which is why we don't want AI bots siphoning our bandwidth & processing power.

5. beeflet ◴[] No.45071476[source]
Clearly you don't want the whole community to enjoy it then. Openness is incompatible with keeping the riff raff out
replies(1): >>45091079 #
6. immibis ◴[] No.45071478[source]
Unironically, if we want everyone to enjoy the town center, we should let people do drugs.
7. presentation ◴[] No.45071513[source]
I think this is actually a good example despite how stark the differences are - both the nuisance AI scrapers and the drug addicts have negative externalities that while possible for them to self regulate, they are for whatever reasons proving unable to do so, and therefore cause other people to have a bad time.

Other commenters saying the usual “drugs are freedom” type opinions, but now having lived in China and Japan where drugs are dealt with very strictly (and basically don’t have a drug problem today), I can see the other side of the argument where in fact places feeling dirty and dangerous because of drugs - even if you think of addicts sympathetically as victims who need help - makes everyone else less free to live the lifestyle they would like to have.

More freedom for one group (whether to ruin their own lives for a high; or to train their AI models) can mean less freedom for others (whether to not feel safe walking in public streets; or to publish their little blog in the public internet).

8. integralid ◴[] No.45073366[source]
So if I buy a DDoS service and DDoS your site, it's ok as long as it accesses it the same way regular people do? In sorry for extreme example, it's obviously not, but that's how I understand your position as written.

We can also consider 10 exploit attempts per second that my site sees.

replies(1): >>45077560 #
9. wvenable ◴[] No.45077560{3}[source]
The issue is that people seem to be conflating badly built scraper bots with AI. If an AI accessed my site as frequently as a normal human (or say Googlebot) then that particular complaint merely goes away. It never had anything to do with AI itself.
10. account42 ◴[] No.45091079[source]
It isn't incompatible at all. You might also be shocked to learn that all you can eat buffets will kick you out if you grab all the food and dump it on your table.