←back to thread

454 points positiveblue | 3 comments | | HN request time: 0s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
wvenable ◴[] No.45067955[source]
> Everyone loves the dream of a free for all and open web... But the reality is how can someone small protect their blog or content from AI training bots?

Aren't these statements entirely in conflict? You either have a free for all open web or you don't. Blocking AI training bots is not free and open for all.

replies(8): >>45067998 #>>45068139 #>>45068376 #>>45068589 #>>45068929 #>>45069170 #>>45073712 #>>45074969 #
BrenBarn ◴[] No.45068929[source]
No, that is not true. It is only true if you just equate "AI training bots" with "people" on some kind of nominal basis without considering how they operate in practice.

It is like saying "If your grocery store is open to the public, why is it not open to this herd of rhinoceroses?" Well, the reason is because rhinoceroses are simply not going to stroll up and down the aisles and head to the checkout line quietly with a box of cereal and a few bananas. They're going to knock over displays and maybe even shelves and they're going to damage goods and generally make the grocery store unusable for everyone else. You can say "Well, then your problem isn't rhinoceroses, it's entities that damage the store and impede others from using it" and I will say "Yes, and rhinoceroses are in that group, so they are banned".

It's certainly possible to imagine a world where AI bots use websites in more acceptable ways --- in fact, it's more or less the world we had prior to about 2022, where scrapers did exist but were generally manageable with widely available techniques. But that isn't the world that we live in today. It's also certainly true that many humans are using websites in evil ways (notably including the humans who are controlling many of these bots), and it's also very true that those humans should be held accountable for their actions. But that doesn't mean that blocking bots makes the internet somehow unfree.

This type of thinking that freedom means no restrictions makes sense only in a sort of logical dreamworld disconnected from practical reality. It's similar to the idea that "freedom" in the socioeconomic sphere means the unrestricted right to do whatever you please with resources you control. Well, no, that is just your freedom. But freedom globally construed requires everyone to have autonomy and be able to do things, not just those people with lots of resources.

replies(4): >>45068997 #>>45072168 #>>45073489 #>>45090949 #
akoboldfrying ◴[] No.45073489[source]
> "If your grocery store is open to the public, why is it not open to this herd of rhinoceroses?"

What this scenario actually reveals is that the words "open to the public" are not intended to mean "access is completely unrestricted".

It's fine to not want to give completely unrestricted access to something. What's not fine, or at least what complicates things unnecessarily, is using words like "open and free" to describe this desired actually-we-do-want-to-impose-certain-unstated-restrictions contract.

I think people use words like "open and free" to describe the actually-restricted contracts they want to have because they're often among like-minded people for whom these unstated additional restrictions are tacitly understood -- or, simply because it sounds good. But for precise communication with a diverse audience, using this kind of language is at best confusing, at worst disingenuous.

replies(2): >>45074012 #>>45077082 #
1. crote ◴[] No.45074012[source]
Nobody has ever meant "access is completely unrestricted".

As a trivial example: what website is going to welcome DDoS attacks or hacking attempts with open arms? Is a website no longer "open to the public" if it has DDoS protection or a WAF? What if the DDoS makes the website unavailable to the vast majority of users: does blocking the DDoS make it more or less open?

Similarly, if a concert is "open to the public", does that mean they'll be totally fine with you bringing a megaphone and yelling through the performance? Will they be okay with you setting the stage on fire? Will they just stand there and say "aw shucks" if you start blocking other people from entering?

You can try to rules-lawyer your way around commonly-understood definitions, but deliberately and obtusely misinterpreting such phrasing isn't going to lead to any kind of productive discussion.

replies(1): >>45074179 #
2. akoboldfrying ◴[] No.45074179[source]
>You can try to rules-lawyer your way around commonly-understood definitions

Despite your assertions to the contrary, "actually free to use for any purpose" is a commonly understood interpretation of "free to use for any purpose" -- see permissive software licenses, where licensors famously don't get to say "But I didn't mean big companies get to use it for free too!"

The onus is on the person using a term like "free" or "open" to clarify the restrictions they actually intend, if any. Putting the onus anywhere else immediately opens the way for misunderstandings, accidental or otherwise.

To make your concert analogy actually fit: A scraper is like a company that sends 1000 robots with tape recorders to your "open to the public" concert. They do only the things an ordinary member of the public do; they can't do anything else. The most "damage" they can do is to keep humans who would enjoy the concert from being able to attend if there aren't enough seats; whatever additional costs they cause (air conditioning, let's say) are the same as the costs that would have been incurred by that many humans.

replies(1): >>45075247 #
3. amiga386 ◴[] No.45075247[source]
> To make your concert analogy actually fit: A scraper is like a company that sends 1000 robots with tape recorders to your "open to the public" concert.

The scraper is sending ten million robots to your concert. They're packing out every area of space, they're up on the stage, they're in all the vestibules and toilets even though they don't need to go. They've completely crowded out all the humans, who were the ones who actually need to see the concert.

You'd have been fine with a few robots. It used to be the case that companies would send one robot each, and even though they were videotaping, they were discreet about it and didn't get in the humans way.

Now some imbecile is sending millions of robots, instead of just one with a video camera. All the robots wear the scraper's company uniform at first, so to deal with this problem you tell all robots wearing it to go home. Then they all come back dressed identically to the humans in the queue, as they jump ahead of them, to deliberately disguise who they are because they know you'll kick them out. They're not taking no for an answer, and they're going to use their sheer mass and numbers to block out your concert. Nobody seems to know why they do it, and nobody knows who is sending the robots for sure, because robot owners are all denying it's theirs. But somebody is sending them.