←back to thread

211 points CrankyBear | 5 comments | | HN request time: 0.635s | source
Show context
thaumaturgy ◴[] No.45107225[source]
People outside of a really small sysadmin niche really don't grasp the scale of this problem.

I run a small-but-growing boutique hosting infrastructure for agency clients. The AI bot crawler problem recently got severe enough that I couldn't just ignore it anymore.

I'm stuck between, on one end, crawlers from companies that absolutely have the engineering talent and resources to do things right but still aren't, and on the other end, resource-heavy WordPress installations where the client was told it was a build-it-and-forget-it kind of thing. I can't police their robots.txt files; meanwhile, each page load can take a full 1s round trip (most of that spent in MySQL), there are about 6 different pretty aggressive AI bots, and occasionally they'll get stuck on some site's product variants or categories pages and start hitting it at a 1r/s rate.

There's an invisible caching layer that does a pretty nice job with images and the like, so it's not really a bandwidth problem. The bots aren't even requesting images and other page resources very often; they're just doing tons and tons of page requests, and each of those is tying up a DB somewhere.

Cumulatively, it is close to having a site get Slashdotted every single day.

I finally started filtering out most bot and crawler traffic at nginx, before it gets passed off to a WP container. I spent a fair bit of time sampling traffic from logs, and at a rough guess, I'd say maybe 5% of web traffic is currently coming from actual humans. It's insane.

I've just wrapped up the first round of work for this problem, but that's just buying a little time. Now, I've gotta put together an IP intelligence system, because clearly these companies aren't gonna take "403" for an answer.

replies(5): >>45107483 #>>45107586 #>>45108498 #>>45109192 #>>45110318 #
1. gjsman-1000 ◴[] No.45107483[source]
I might write a blog post on this, but I seriously believe we collectively need to rethink The Cathedral and the Bazaar.

The Cathedral won. Full stop. Everyone, more or less, is just a stonecutter, competing to sell the best stone (i.e. content, libraries, source code, tooling) for building the cathedrals with. If the world is a farmer's market, we're shocked that the farmer's market is not defeating Walmart, and never will.

People want Cathedrals; not Bazaars. Being a Bazaar vendor is a race to the bottom. This is not the Cathedral exploiting a "tragedy of the commons," it's intrinsic to decentralization as a whole. The Bazaar feeds the Cathedral, just as the farmers feed Walmart, just as independent websites feed Claude, a food chain and not an aberration.

replies(2): >>45107893 #>>45109253 #
2. thaumaturgy ◴[] No.45107893[source]
The Cathedral and the Bazaar meets The Tragedy of the Commons.

Let's say there's two competing options in some market. One option is fully commercialized, the other option holds to open-source ideals (whatever those are).

The commercial option attracts investors, because investors like money. The money attracts engineers, because at some point "hacker" came to mean "comfortable lifestyle in a high COL area". The commercial option gets all the resources, it gets a marketing team, and it captures 75% of the market because most people will happily pay a few dollars for something they don't have to understand.

The open source option attracts a few enthusiasts (maybe; or, often, just one), who labor at it in whatever spare time they can scrape together. Because it's free, other commercial entities use and rely on the open source thing, as long it continues to be maintained in something that, if you squint, resembles slave labor. The open source option is always a bit harder to use, with fewer features, but it appeals to the 25% of the market that cares about things like privacy or ownership or self-determination.

So, one conclusion is "people want Cathedrals", but another conclusion could be that all of our society's incentives are aligned towards Cathedrals.

It would be insane, after all, to not pursue wealth just because of some personal ideals.

replies(2): >>45107964 #>>45108507 #
3. gjsman-1000 ◴[] No.45107964[source]
The answer is quite simply that where complexity exceeds the regular person's interest, there will be a cathedral.

It's not about capitalism or incentives. Humans have cognitive limits and technology is very low on the list for most. They want someone else to handle complexity so they can focus on their lives. Medieval guilds, religious hierarchies, tribal councils, your distribution's package repository, it's all cathedrals. Humans have always delegated complexity to trusted authorities.

The 25% who 'care about privacy or ownership' mostly just say they care. When actually faced with configuring their own email server or compiling their own kernel, 24% of that 25% immediately choose the cathedral. You know the type, the people who attend FOSDEM carrying MacBooks. The incentives don't create the demand for cathedrals, but respond to it. Even in a post-scarcity commune, someone would emerge to handle the complex stuff while everyone else gratefully lets them.

The bazaar doesn't lose because of capitalism. It loses because most humans, given the choice between understanding something complex or trusting someone else to handle it, will choose trust every time. Not just trust, but CYA (I'm not responsible for something I don't fully understand) every time. Why do you think AI is successful? I'd rather even trust a blathering robot than myself. It turns out, people like being told what to do on things they don't care about.

4. rurp ◴[] No.45108507[source]
This is pretty much a more eloquent version of what I was about to write. It's dangerous to take a completely results oriented view of a situation where the commercial incentives are so absurdly lopsided. The cathedral owners spend more than the GDP of most countries every year on various carrots and sticks to maintain something like the current ecosystem. I think the current world is far from ideal for most people, but it's hard to compete against the coordinated efforts of the richest and most powerful entities in the world.
5. AnthonyMouse ◴[] No.45109253[source]
> The Bazaar feeds the Cathedral

Isn't this the licensing problem? Berkeley release BSD so that everyone can use it, people do years of work to make it passable, Apple takes it to make macOS and iOS because the license allows them to, and then they have both the community's work and their own work so everyone uses that.

The Linux kernel is GPLv2, not GPLv3, so vendors distribute binary blob drivers/firmware with their hardware and then the hardware becomes unusable as soon as they stop publishing new versions because then to use the hardware you're stuck with an old kernel with known security vulnerabilities, or they lock the boot loader because v2 lacks the anti-Tivoization clause in v3.

If you use a license that lets the cathedral close off the community's work then you lose, but what if you don't do that?