←back to thread

454 points positiveblue | 2 comments | | HN request time: 0s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
wvenable ◴[] No.45067955[source]
> Everyone loves the dream of a free for all and open web... But the reality is how can someone small protect their blog or content from AI training bots?

Aren't these statements entirely in conflict? You either have a free for all open web or you don't. Blocking AI training bots is not free and open for all.

replies(8): >>45067998 #>>45068139 #>>45068376 #>>45068589 #>>45068929 #>>45069170 #>>45073712 #>>45074969 #
BrenBarn ◴[] No.45067998[source]
I think that was the point. Everyone loves the dream, but the reality is different.
replies(1): >>45068015 #
wilson090 ◴[] No.45068015[source]
How so? If you don't want AI bots reading information on the web, you don't actually want a free and open web. The reality of an open web is that such information is free and available for anyone.
replies(6): >>45068058 #>>45068155 #>>45068305 #>>45068547 #>>45068621 #>>45068828 #
gradstudent ◴[] No.45068058[source]
How is it available for everyone if the AI bots bring down your server?
replies(5): >>45068142 #>>45068202 #>>45068241 #>>45068453 #>>45068709 #
wvenable ◴[] No.45068709[source]
Is that really the problem we are discussing? I've had people attack my server and bring it down. But that has nothing to do with being free and open to everyone. A top hacker news post could take my server.
replies(1): >>45069858 #
danudey ◴[] No.45069858[source]
Yes, because a top hacker news post takes your server down because a large number of actual humans are looking to gain actual value from your posts. Meanwhile, you stand to benefit from the HN discussion by learning new things and perspectives from the community.

The AI bot assault, on the other hand, is one company (or a few companies) re-fetching the same data over and over again, constantly, in perpetuity, just in case it's changed, all so they can incorporate it into their training set and make money off of it while giving you zero credit and providing zero feedback.

replies(1): >>45070023 #
wvenable ◴[] No.45070023[source]
But then we get to use those AI tools.

The refrain here comes down not to "AI" but mostly to "the AI bot assault" which is a different thing. Sure lets have an discussion about badly behaved and overzealous web scrapers. As for credit, I've asked AI for it's references and gotten them. If my information is merely mushed into AI training model I'm not sure why I need credit. If you discuss this thread with your friends are you going to give me credit?

replies(2): >>45072211 #>>45072463 #
1. tsimionescu ◴[] No.45072463{3}[source]
No, you don't "get to" use the AI tools. You have to buy access to them (beyond some free trials).
replies(1): >>45077577 #
2. wvenable ◴[] No.45077577[source]
Yes. I get to buy access to them. They're providing an expensive to provide service that requires specialized expertise. I don't see the problem with that.