Most active commenters

ctoth(3)

Popular/hot comments

>>45068822 #

←back to thread

The web does not need gatekeepers: Cloudflare’s new “signed agents” pitch

(positiveblue.substack.com)

Show context

ctoth ◴[29 Aug 25 19:46 UTC] No.45068556[source]▶

>>45066258 (OP) #

The web doesn't need attestation. It doesn't need signed agents. It doesn't need Cloudflare deciding who's a "real" user agent. It needs people to remember that "public" means PUBLIC and implement basic damn rate limiting if they can't handle the traffic.

The web doesn't need to know if you're a human, a bot, or a dog. It just needs to serve bytes to whoever asks, within reasonable resource constraints. That's it. That's the open web. You'll miss it when it's gone.

replies(9): >>45068690 #>>45068959 #>>45069370 #>>45069779 #>>45069921 #>>45070226 #>>45070359 #>>45071126 #>>45071216 #

1. johncolanduoni ◴[29 Aug 25 19:58 UTC] No.45068690[source]▶

>>45068556 #

Basic damn rate limiting is pretty damn exploitable. Even ignoring botnets (which is impossible), usefully rate limiting IPv6 is anything but basic. If you just pick some prefix from /48 to /64 to key your rate limits on, you'll either be exploitable by IPs from providers that hand out /48s like candy or you'll bucket a ton of mobile users together for a single rate limit.

replies(1): >>45068822 #

2. ctoth ◴[29 Aug 25 20:09 UTC] No.45068822[source]▶

>>45068690 (TP) #

You make unauthenticated requests cheap enough that you don't care about volume. Reserve rate limiting for authenticated users where you have real identity. The open web survives by being genuinely free to serve, not by trying to guess who's "real."

A basic Varnish setup should get you most of the way there, no agent signing required!

replies(3): >>45068881 #>>45069206 #>>45070262 #

3. hombre_fatal ◴[29 Aug 25 20:13 UTC] No.45068881[source]▶

>>45068822 #

Your response to unauthenticated requests could be <h1>Hello world</h1> served from memory and your server/link will still fail under a volumetric attack, and you still get the pleasure of paying for the bandwidth.

So no, this advice has been outdated for decades.

Also you're doing some sort of victim blaming where everyone on earth has to engineer their service to withstand DoS instead of outsourcing that to someone else. Abusers outsource their attacks to everyone else's machine (decentralization ftw!), but victims can't outsource their defense because centralization goes against your ideals.

At least lament the naive infrastructure of the internet or something, sheesh.

replies(2): >>45069301 #>>45069407 #

4. Lammy ◴[29 Aug 25 20:45 UTC] No.45069206[source]▶

>>45068822 #

> You make unauthenticated requests cheap enough that you don't care about volume.

In the days before mandatory TLS it was so easy to set up a Squid proxy on the edge of my network and cache every plain-HTTP resource for as long as I want.

Like yeah, yeah, sure, it sucked that ISPs could inject trackers and stuff into page contents, but I'm starting to think the downsides of mandatory TLS outweigh the upsides. We made the web more Secure at the cost of making it less Private. We got Google Analytics and all the other spyware running over TLS and simultaneously made it that much harder for any normal person to host anything online.

replies(1): >>45069596 #

5. ◴[29 Aug 25 20:56 UTC] No.45069301{3}[source]▶

>>45068881 #

6. ctoth ◴[29 Aug 25 21:09 UTC] No.45069407{3}[source]▶

>>45068881 #

We started with "AI crawlers are too aggressive" and you've escalated to volumetric DDoS. These aren't the same problem. OpenAI hitting your API too hard is solved by caching, not by Cloudflare deciding who gets an "agent passport."

"Victim blaming"? Can we please leave these therapy-speak terms back in the 2010s where they belong and out of technical discussions? If expecting basic caching is victim blaming, then so is expecting HTTPS, password hashing, or any technical competence whatsoever.

Your decentralization point actually proves mine: yes, attackers distribute while defenders centralize. That's why we shouldn't make centralization mandatory! Right now you can choose Cloudflare. With attestation, they become the web's border control.

The fine article makes it clear what this is really about - Cloudflare wants to be the gatekeeper for agent traffic. Agent attestation doesn't solve volumetric attacks (those need the DDoS protection they already sell, no new proposal required!) They're creating an allowlist where they decide who's "legitimate."

But sure, let's restructure the entire web's trust model because some sites can't configure a cache. That seems proportional.

replies(2): >>45069608 #>>45069683 #

7. AnthonyMouse ◴[29 Aug 25 21:27 UTC] No.45069596{3}[source]▶

>>45069206 #

You can still do that, you have the caching reverse proxy at the edge of the network be the thing that terminates TLS.

replies(1): >>45069838 #

8. hombre_fatal ◴[29 Aug 25 21:28 UTC] No.45069608{4}[source]▶

>>45069407 #

Well, your post escalated to the broad claim that I responded to.

You didn't just disagree with AI crawler attestation: you're saying that nobody should distinguish earnest users from everything else because they should bear the cost of serving both, which necessarily entails bad traffic and incidental DoS.

Once again, services like CloudFlare exist because a cache isn't sufficient to deal with arbitrary traffic, and the scale of modern abuse is so large that only a few megacorps can provide the service that people want.

9. danudey ◴[29 Aug 25 21:36 UTC] No.45069683{4}[source]▶

>>45069407 #

OpenAI hitting your static, cached pages too hard and costing you terabytes of extra bandwidth that you have to pay for (both in bandwidth itself and data transfer fees) isn't solved by caching.

The post you're replying to points out that, at a certain scale, even caching things in-memory can cause your system to fall over when a user agent (e.g. AI scraper bots) are behaving like bad actors, ignoring robots.txt, and fetching every URL twenty times a day while completely ignoring cache headers/last modified/etc.

Your points were all valid when we were dealing with either "legitimate users", "legitimate good-faith bots", and "bad actors", but now the AI companies' need for massive amounts of up-to-the-minute content at all costs means that we have to add "legitimate bad-faith bots" to the mix.

> Agent attestation doesn't solve volumetric attacks (those need the DDoS protection they already sell, no new proposal required!) They're creating an allowlist where they decide who's "legitimate."

Agent attestation solves overzealous AI scraping which looks like a volumetric attack, because if you refuse to provide the content to the bots then the bots will leave you alone (or at least, they won't chew up your bandwidth by re-fetching the same content over and over all day).

10. Lammy ◴[29 Aug 25 21:51 UTC] No.45069838{4}[source]▶

>>45069596 #

Not really. At minimum you will break all of these sites on the HSTS preload list: https://source.chromium.org/chromium/chromium/src/+/main:net...

replies(2): >>45069989 #>>45079232 #

11. TheCycoONE ◴[29 Aug 25 22:10 UTC] No.45069989{5}[source]▶

>>45069838 #

Public key pinning was rejected so you just need your proxy to also supply a certificate that's trusted by your clients.

12. johncolanduoni ◴[29 Aug 25 22:46 UTC] No.45070262[source]▶

>>45068822 #

I guess you should start a Cloudflare competitor that just puts a cheap Varnish VM in front of websites to solve bots forever.

13. AnthonyMouse ◴[31 Aug 25 00:26 UTC] No.45079232{5}[source]▶

>>45069838 #

It isn't the client side who does this, it's the server side. Doing it on the client side has a nominal benefit in the typical case but is very little value to you when the problem is some misbehaving third party AI scraper taking down the server when you need to get something from it that isn't already in the local cache.

If you have three local machines, you might be able to turn three queries into one, assuming they all visit the same site instead of different people using different sites.

If you do this on the server, a request that requires the execution of PHP code and three SQL queries goes from happening on every request for the same resource to happening once and then the subsequent requests are just shoveling the cached response back out the pipe instead of having to process it again. Instead of reducing the number of requests that reach the back end by 3:1 you reduce it by a million to one.

And that doesn't cause any HSTS problems because a reverse proxy operated by the site owner has the real certificate in it.

↑