←back to thread

454 points positiveblue | 1 comments | | HN request time: 0s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
gausswho ◴[] No.45066945[source]
What we need is some legal teeth behind robots.txt. It won't stop everyone, but Big Corp would be a tasty target for lawsuits.
replies(8): >>45067035 #>>45067135 #>>45067195 #>>45067518 #>>45067718 #>>45067723 #>>45068361 #>>45068809 #
notatoad ◴[] No.45067035[source]
It wouldn’t stop anyone. The bots you want to block already operate out of places where those laws wouldn’t be enforced.
replies(2): >>45067279 #>>45091305 #
qbane ◴[] No.45067279[source]
Then that is a good reason to deny the requests from those IPs
replies(2): >>45067652 #>>45069724 #
1. literalAardvark ◴[] No.45067652[source]
I've run a few hundred small domains for various online stores with an older backend that didn't scale very well for crawlers and at some point we started blocking by continent.

It's getting really, really ugly out there.