←back to thread

454 points positiveblue | 2 comments | | HN request time: 0s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
Gud ◴[] No.45066626[source]
By developing Free Software combating these hostile softwares.

Corporations develop hostile AI agents,

Capable hackers develop anti-AI-agents.

This defeatist atittude "we have no power".

replies(7): >>45066667 #>>45066678 #>>45066770 #>>45066789 #>>45066830 #>>45067106 #>>45067374 #
victorbjorklund ◴[] No.45067374[source]
So basically cloudflare but self-hosted (with all the pain that comes from that)?
replies(1): >>45067532 #
Gud ◴[] No.45067532[source]
What’s so painful about self hosting? I’ve been self hosting since before I hit puberty. If 12 year old me can run a httpd, anyone can.

And if you don’t want to self host, at least try to use services from organisations that aren’t hostile to the open web

replies(2): >>45067558 #>>45067559 #
victorbjorklund ◴[] No.45067558[source]
I self-host lots of stuff. But yes it is more pain to host a WAF that can handle billions of request per minute. Even harder to do it for free like Cloudflare. And in the end the end result for the user is exactly the same if you use a self-hosted WAF or let someone else host it for you.
replies(2): >>45068536 #>>45069544 #
lucb1e ◴[] No.45068536[source]
If you're handling billions of requests per second, you're not a self hoster. That's a commercial service with a dedicated team to handle traffic around the clock. Most ISPs probably don't even operate lines that big

To put that in perspective, even if they're sending empty TCP packets, "several billion" pps is 200 to 1800 gigabits of traffic, depending on what you mean by that. Add a cookieless HTTP payload and you're at many terabits per second. The average self hoster is more likely to get struck by lightning than encounter and need protection from this (even without considering the, probably modest, consequences of being offline a few hours if it does happen)

Edit: off by a factor of 60, whoops. Thanks to u/Gud for pointing that out. I stand by the conclusion though: less likely to occur than getting struck by lightning (or maybe it's around equally likely now? But somewhere in that ballpark) and the consequences of being down for a few hours are generally not catastrophic anyway. You can always still put big brother in front if this event does happen to you and your ISP can't quickly drop the abusive traffic

replies(2): >>45068644 #>>45069180 #
1. Gud ◴[] No.45068644[source]
To be fair, he did say per minute :-)
replies(1): >>45068787 #
2. lucb1e ◴[] No.45068787[source]
Oh, whoops. Divide everything by 60, quick!

That does make it a bit less ludicrous even if I think the conclusion of my response still applies