←back to thread

454 points positiveblue | 1 comments | | HN request time: 0.334s | source
Show context
TIPSIO ◴[] No.45066555[source]
Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #
Gud ◴[] No.45066626[source]
By developing Free Software combating these hostile softwares.

Corporations develop hostile AI agents,

Capable hackers develop anti-AI-agents.

This defeatist atittude "we have no power".

replies(7): >>45066667 #>>45066678 #>>45066770 #>>45066789 #>>45066830 #>>45067106 #>>45067374 #
victorbjorklund ◴[] No.45067374[source]
So basically cloudflare but self-hosted (with all the pain that comes from that)?
replies(1): >>45067532 #
Gud ◴[] No.45067532[source]
What’s so painful about self hosting? I’ve been self hosting since before I hit puberty. If 12 year old me can run a httpd, anyone can.

And if you don’t want to self host, at least try to use services from organisations that aren’t hostile to the open web

replies(2): >>45067558 #>>45067559 #
victorbjorklund ◴[] No.45067558[source]
I self-host lots of stuff. But yes it is more pain to host a WAF that can handle billions of request per minute. Even harder to do it for free like Cloudflare. And in the end the end result for the user is exactly the same if you use a self-hosted WAF or let someone else host it for you.
replies(2): >>45068536 #>>45069544 #
lucb1e ◴[] No.45068536[source]
If you're handling billions of requests per second, you're not a self hoster. That's a commercial service with a dedicated team to handle traffic around the clock. Most ISPs probably don't even operate lines that big

To put that in perspective, even if they're sending empty TCP packets, "several billion" pps is 200 to 1800 gigabits of traffic, depending on what you mean by that. Add a cookieless HTTP payload and you're at many terabits per second. The average self hoster is more likely to get struck by lightning than encounter and need protection from this (even without considering the, probably modest, consequences of being offline a few hours if it does happen)

Edit: off by a factor of 60, whoops. Thanks to u/Gud for pointing that out. I stand by the conclusion though: less likely to occur than getting struck by lightning (or maybe it's around equally likely now? But somewhere in that ballpark) and the consequences of being down for a few hours are generally not catastrophic anyway. You can always still put big brother in front if this event does happen to you and your ISP can't quickly drop the abusive traffic

replies(2): >>45068644 #>>45069180 #
PaulHoule ◴[] No.45069180[source]
If somebody decides they hate you, your site that could handle, say, 100,000 legitimate requests per day could suddenly get billions of illegitimate requests.
replies(2): >>45069486 #>>45070817 #
lucb1e ◴[] No.45070817[source]
They could. Let me know when it happens

I have this argument every time self hosting comes up, and every time I wonder if someone will do it to me to make a point. Or if one of the like million other comments I post upsets someone or one of the many tools that I host. Yet to happen, idk. It's like arguing whether you need a knife on the street at all times because someone might get angry from a look. It happens, we have a word for it in NL (zinloos geweld) and tiles in sidewalks (lady bug depictions) and everything, but no normal person actually wears weapons 24/7 (drug dealers surely yeah) or has people talk through a middle person

I'd suspect other self hosters just see more shit than I do, were it not for that nobody ever says it happened to them. The only argument I ever hear is that they want to be "safe" while "self hosting with cloudflare". Who's really hosting your shit then?

replies(1): >>45075265 #
PaulHoule ◴[] No.45075265[source]
I've had my involvement with the computer underground.

A web site owner published something he really shouldn't have and got hacked. I wound up being a "person of interest" in the resulting FBI investigation because I was the weirdest person in the chat room for the site. I think it drove them crazy I was using Tor so they got somebody to try to entrap me into sharing CP but (1) I'm not interested and (2) know better than that.

replies(1): >>45096447 #
1. lucb1e ◴[] No.45096447[source]
That's definitely the most interesting response I've had to this question, thanks for that

Will have to give this a second thought but as a first one now that I read this: ...and would Cloudflare have helped against the FBI, or any foreign nation doing a request with Cloudflare against child porn? Surely not?! A different kind of opsec is surely more relevant there, so I don't know if it's really relevant to "normal", legal self hosting (as opposed to criminal, much less that level of unethical+criminal) communities or if there's an aspect I'm missing here