Most active commenters

wvenable(6)
frm88(3)

The web does not need gatekeepers: Cloudflare’s new “signed agents” pitch

(positiveblue.substack.com)

Show context

TIPSIO ◴[29 Aug 25 16:57 UTC] No.45066555[source]▶

Everyone loves the dream of a free for all and open web.

But the reality is how can someone small protect their blog or content from AI training bots? E.g.: They just blindly trust someone is sending Agent vs Training bots and super duper respecting robots.txt? Get real...

Or, fine what if they do respect robots.txt, but they buy the data that may or may not have been shielded through liability layers via "licensed data"?

Unless you're reddit, X, Google, or Meta with scary unlimited budget legal teams, you have no power.

Great video: https://www.youtube.com/shorts/M0QyOp7zqcY

replies(37): >>45066600 #>>45066626 #>>45066827 #>>45066906 #>>45066945 #>>45066976 #>>45066979 #>>45067024 #>>45067058 #>>45067180 #>>45067399 #>>45067434 #>>45067570 #>>45067621 #>>45067750 #>>45067890 #>>45067955 #>>45068022 #>>45068044 #>>45068075 #>>45068077 #>>45068166 #>>45068329 #>>45068436 #>>45068551 #>>45068588 #>>45069623 #>>45070279 #>>45070690 #>>45071600 #>>45071816 #>>45075075 #>>45075398 #>>45077464 #>>45077583 #>>45080415 #>>45101938 #

wvenable ◴[29 Aug 25 18:51 UTC] No.45067955[source]▶

>>45066555 #

> Everyone loves the dream of a free for all and open web... But the reality is how can someone small protect their blog or content from AI training bots?

Aren't these statements entirely in conflict? You either have a free for all open web or you don't. Blocking AI training bots is not free and open for all.

replies(8): >>45067998 #>>45068139 #>>45068376 #>>45068589 #>>45068929 #>>45069170 #>>45073712 #>>45074969 #

BrenBarn ◴[29 Aug 25 18:55 UTC] No.45067998[source]▶

>>45067955 #

I think that was the point. Everyone loves the dream, but the reality is different.

replies(1): >>45068015 #

wilson090 ◴[29 Aug 25 18:58 UTC] No.45068015[source]▶

>>45067998 #

How so? If you don't want AI bots reading information on the web, you don't actually want a free and open web. The reality of an open web is that such information is free and available for anyone.

replies(6): >>45068058 #>>45068155 #>>45068305 #>>45068547 #>>45068621 #>>45068828 #

1. gradstudent ◴[29 Aug 25 19:02 UTC] No.45068058[source]▶

>>45068015 #

How is it available for everyone if the AI bots bring down your server?

replies(5): >>45068142 #>>45068202 #>>45068241 #>>45068453 #>>45068709 #

2. sebasvisser ◴[29 Aug 25 19:09 UTC] No.45068142[source]▶

>>45068058 (TP) #

Build better

3. mikestorrent ◴[29 Aug 25 19:14 UTC] No.45068202[source]▶

>>45068058 (TP) #

Ultimately, you have to realize that this is a losing battle, unless we have completely draconian control over every piece of silicon. Captchas are being defeated; at this point they're basically just mechanisms to prove you Really Want to Make That Request to the extent that you'll spend some compute time on it, which is starting to become a bit of a waste of electricity and carbon.

Talented people that want to scrape or bot things are going to find ways to make that look human. If that comes in the form of tricking a physical iPhone by automatically driving the screen physically, so be it; many such cases already!

The techniques you need for preventing DDoS don't need to really differentiate that much between bots and people unless you're being distinctly targeted; Fail2Ban-style IP bans are still quite effective, and basic WAF functionality does a lot.

replies(1): >>45091177 #

4. edoceo ◴[29 Aug 25 19:18 UTC] No.45068241[source]▶

>>45068058 (TP) #

Everyone can get it from the bots now?

5. ForHackernews ◴[29 Aug 25 19:37 UTC] No.45068453[source]▶

>>45068058 (TP) #

Rate-limits? Use a CDN? Lots of traffic can be a problem whether it's bots or humans.

replies(1): >>45069881 #

6. wvenable ◴[29 Aug 25 20:00 UTC] No.45068709[source]▶

>>45068058 (TP) #

Is that really the problem we are discussing? I've had people attack my server and bring it down. But that has nothing to do with being free and open to everyone. A top hacker news post could take my server.

replies(1): >>45069858 #

7. danudey ◴[29 Aug 25 21:53 UTC] No.45069858[source]▶

>>45068709 #

Yes, because a top hacker news post takes your server down because a large number of actual humans are looking to gain actual value from your posts. Meanwhile, you stand to benefit from the HN discussion by learning new things and perspectives from the community.

The AI bot assault, on the other hand, is one company (or a few companies) re-fetching the same data over and over again, constantly, in perpetuity, just in case it's changed, all so they can incorporate it into their training set and make money off of it while giving you zero credit and providing zero feedback.

replies(1): >>45070023 #

8. danudey ◴[29 Aug 25 21:55 UTC] No.45069881[source]▶

>>45068453 #

You realize this entire thread is about a pitch from a CDN company trying to solve an issue that has presented itself at such a scale that this is the best option they can think of to keep the web alive, right?

"Use a CDN" is not sufficient when these bots are so incredibly poorly behaved, because you're still paying for that CDN and this bad behavior is going to cost you a fortune in CDN costs (or cost the CDN a fortune instead, which is why Cloudflare is suggesting this).

9. wvenable ◴[29 Aug 25 22:13 UTC] No.45070023{3}[source]▶

>>45069858 #

But then we get to use those AI tools.

The refrain here comes down not to "AI" but mostly to "the AI bot assault" which is a different thing. Sure lets have an discussion about badly behaved and overzealous web scrapers. As for credit, I've asked AI for it's references and gotten them. If my information is merely mushed into AI training model I'm not sure why I need credit. If you discuss this thread with your friends are you going to give me credit?

replies(2): >>45072211 #>>45072463 #

10. frm88 ◴[30 Aug 25 05:55 UTC] No.45072211{4}[source]▶

>>45070023 #

"If you discuss this thread with your friends are you going to give me credit?"

Yes. How else would I enable my friends to look it up for themselves?

replies(1): >>45077584 #

11. tsimionescu ◴[30 Aug 25 06:51 UTC] No.45072463{4}[source]▶

>>45070023 #

No, you don't "get to" use the AI tools. You have to buy access to them (beyond some free trials).

replies(1): >>45077577 #

12. wvenable ◴[30 Aug 25 20:04 UTC] No.45077577{5}[source]▶

>>45072463 #

Yes. I get to buy access to them. They're providing an expensive to provide service that requires specialized expertise. I don't see the problem with that.

13. wvenable ◴[30 Aug 25 20:05 UTC] No.45077584{5}[source]▶

>>45072211 #

6 months from now when you've internalized this entire thread are you even going to remember where you got it from?

replies(1): >>45080671 #

14. frm88 ◴[31 Aug 25 05:50 UTC] No.45080671{6}[source]▶

>>45077584 #

Why are you shifting the discussion by adding two new variables (time/memory)?

replies(1): >>45087055 #

15. wvenable ◴[31 Aug 25 21:02 UTC] No.45087055{7}[source]▶

>>45080671 #

Because that's how one interacts with AI.

replies(2): >>45087955 #>>45090441 #

16. frm88 ◴[01 Sep 25 07:40 UTC] No.45090441{8}[source]▶

>>45087055 #

Yeah. Running out of arguments, are you?

17. account42 ◴[01 Sep 25 09:48 UTC] No.45091177[source]▶

>>45068202 #

Agreed, copyright issues need to be solved via legislation and network abuse issues need to be solved by network operators. Trying to run around either only makes the web worse for everyone.

↑