The web doesn't need to know if you're a human, a bot, or a dog. It just needs to serve bytes to whoever asks, within reasonable resource constraints. That's it. That's the open web. You'll miss it when it's gone.
The web doesn't need to know if you're a human, a bot, or a dog. It just needs to serve bytes to whoever asks, within reasonable resource constraints. That's it. That's the open web. You'll miss it when it's gone.
A basic Varnish setup should get you most of the way there, no agent signing required!
So no, this advice has been outdated for decades.
Also you're doing some sort of victim blaming where everyone on earth has to engineer their service to withstand DoS instead of outsourcing that to someone else. Abusers outsource their attacks to everyone else's machine (decentralization ftw!), but victims can't outsource their defense because centralization goes against your ideals.
At least lament the naive infrastructure of the internet or something, sheesh.
In the days before mandatory TLS it was so easy to set up a Squid proxy on the edge of my network and cache every plain-HTTP resource for as long as I want.
Like yeah, yeah, sure, it sucked that ISPs could inject trackers and stuff into page contents, but I'm starting to think the downsides of mandatory TLS outweigh the upsides. We made the web more Secure at the cost of making it less Private. We got Google Analytics and all the other spyware running over TLS and simultaneously made it that much harder for any normal person to host anything online.
"Victim blaming"? Can we please leave these therapy-speak terms back in the 2010s where they belong and out of technical discussions? If expecting basic caching is victim blaming, then so is expecting HTTPS, password hashing, or any technical competence whatsoever.
Your decentralization point actually proves mine: yes, attackers distribute while defenders centralize. That's why we shouldn't make centralization mandatory! Right now you can choose Cloudflare. With attestation, they become the web's border control.
The fine article makes it clear what this is really about - Cloudflare wants to be the gatekeeper for agent traffic. Agent attestation doesn't solve volumetric attacks (those need the DDoS protection they already sell, no new proposal required!) They're creating an allowlist where they decide who's "legitimate."
But sure, let's restructure the entire web's trust model because some sites can't configure a cache. That seems proportional.
You didn't just disagree with AI crawler attestation: you're saying that nobody should distinguish earnest users from everything else because they should bear the cost of serving both, which necessarily entails bad traffic and incidental DoS.
Once again, services like CloudFlare exist because a cache isn't sufficient to deal with arbitrary traffic, and the scale of modern abuse is so large that only a few megacorps can provide the service that people want.
The post you're replying to points out that, at a certain scale, even caching things in-memory can cause your system to fall over when a user agent (e.g. AI scraper bots) are behaving like bad actors, ignoring robots.txt, and fetching every URL twenty times a day while completely ignoring cache headers/last modified/etc.
Your points were all valid when we were dealing with either "legitimate users", "legitimate good-faith bots", and "bad actors", but now the AI companies' need for massive amounts of up-to-the-minute content at all costs means that we have to add "legitimate bad-faith bots" to the mix.
> Agent attestation doesn't solve volumetric attacks (those need the DDoS protection they already sell, no new proposal required!) They're creating an allowlist where they decide who's "legitimate."
Agent attestation solves overzealous AI scraping which looks like a volumetric attack, because if you refuse to provide the content to the bots then the bots will leave you alone (or at least, they won't chew up your bandwidth by re-fetching the same content over and over all day).
If you have three local machines, you might be able to turn three queries into one, assuming they all visit the same site instead of different people using different sites.
If you do this on the server, a request that requires the execution of PHP code and three SQL queries goes from happening on every request for the same resource to happening once and then the subsequent requests are just shoveling the cached response back out the pipe instead of having to process it again. Instead of reducing the number of requests that reach the back end by 3:1 you reduce it by a million to one.
And that doesn't cause any HSTS problems because a reverse proxy operated by the site owner has the real certificate in it.