←back to thread

454 points positiveblue | 2 comments | | HN request time: 0.503s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
1. h4ck_th3_pl4n3t ◴[] No.45077583[source]
The problem's cause really is about _who_ has to pay for the traffic, and currently that's the hosting end. If you turn that model around, suddenly AI web scrapers have to behave and all the issues that we currently have are kind of solved(?), because there is no incentive to scrape datasets anymore that were put together by others, and there automatically will be a payment incentive instead to buy high-quality datasets.
replies(1): >>45091544 #
2. account42 ◴[] No.45091544[source]
But I don't want to make human users of my website pay for the traffic just like I also donate to real world charities that I believe in.