Distributed systems programming has stalled

(www.shadaj.me)

287 points shadaj | 4 comments | 27 Feb 25 16:12 UTC | HN request time: 1.156s | source

Show context

bsnnkv ◴[27 Feb 25 16:52 UTC] No.43196091[source]▶

>>43195702 (OP) #

Last month I switched from a role working on a distributed system (FAANG) to a role working on embedded software which runs on cards in data center racks.

I was in my last role for a year, and 90%+ of my time was spent investigating things that went "missing" at one of many failure points between one of the many distributed components.

I wrote less than 200 lines of code that year and I experienced the highest level of burnout in my professional career.

The technical aspect that contributed the most to this burnout was both the lack of observability tooling and the lack of organizational desire to invest in it. Whenever I would bring up this gap I would be told that we can't spend time/money and wait for people to create "magic tools".

So far the culture in my new embedded (Rust, fwiw) position is the complete opposite. If you're burnt out working on distributed systems and you care about some of the same things that I do, it's worth giving embedded software dev a shot.

replies(24): >>43196122 #>>43196159 #>>43196163 #>>43196180 #>>43196239 #>>43196674 #>>43196899 #>>43196910 #>>43196931 #>>43197177 #>>43197902 #>>43198895 #>>43199169 #>>43199589 #>>43199688 #>>43199980 #>>43200186 #>>43200596 #>>43200725 #>>43200890 #>>43202090 #>>43202165 #>>43205115 #>>43208643 #

alabastervlog ◴[27 Feb 25 18:16 UTC] No.43196899[source]▶

>>43196091 #

I've found the rush to distributed computing when it's not strictly necessary kinda baffling. The costs in complexity are extreme. I can't imagine the median company doing this stuff is actually getting either better uptime or performance out of it—sure, it maybe recovers better if something breaks, maybe if you did everything right and regularly test that stuff (approximately nobody does though), but there's also so very much more crap that can break in the first place.

Plus: far worse performance ("but it scales smoothly" OK but your max probable scale, which I'll admit does seem high on paper if you've not done much of this stuff before, can fit on one mid-size server, you've just forgotten how powerful computers are because you've been in cloud-land too long...) and crazy-high costs for related hardware(-equivalents), resources, and services.

All because we're afraid to shell into an actual server and tail a log, I guess? I don't know what else it could be aside from some allergy to doing things the "old way"? I dunno man, seems way simpler and less likely to waste my whole day trying to figure out why, in fact, the logs I need weren't fucking collected in the first place, or got buried some damn corner of our Cloud I'll never find without writing a 20-line "log query" in some awful language I never use for anything else, in some shitty web dashboard.

Fewer, or cheaper, personnel? I've never seen cloud transitions do anything but the opposite.

It's like the whole industry went collectively insane at the same time.

[EDIT] Oh, and I forgot, for everything you gain in cloud capabilities it seems like you lose two or three things that are feasible when you're running your own servers. Simple shit that's just "add two lines to the nginx config and do an apt-install" becomes three sprints of custom work or whatever, or just doesn't happen because it'd be too expensive. I don't get why someone would give that stuff up unless they really, really had to.

[EDIT EDIT] I get that this rant is more about "the cloud" than distributed systems per se, but trying to build "cloud native" is the way that most orgs accidentally end up dealing with distributed systems in a much bigger way than they have to.

replies(10): >>43197578 #>>43197608 #>>43197740 #>>43199134 #>>43199560 #>>43201628 #>>43201737 #>>43202751 #>>43204072 #>>43225726 #

1. jimbokun ◴[27 Feb 25 19:31 UTC] No.43197608[source]▶

>>43196899 #

Distributed or not is a very binary function. If you can run in one large server, great, just write everything in non-distributed fashion.

But once you need that second server, everything about your application needs to work in distributed fashion.

replies(2): >>43198610 #>>43228522 #

2. th0ma5 ◴[27 Feb 25 21:18 UTC] No.43198610[source]▶

>>43197608 (TP) #

I wish I could upvote you again. The complexity balloons when you try to adapt something that wasn't distributed, and often things can be way simpler and more robust if you start with a distributed concept.

replies(1): >>43207352 #

3. CogitoCogito ◴[28 Feb 25 16:21 UTC] No.43207352[source]▶

>>43198610 #

I couldn't disagree more. My principle is to write systems extremely simply and then distribute portions of it as it becomes necessary. Almost always it never becomes necessary and the rare cases it does, it is entirely straight forward to do so unless you have an over-complicated design. I don't think I've ever seen it done well when done in the opposite direction. It's always cost more in time and effort and resulted in something worse.

replies(1): >>43216746 #

4. th0ma5 ◴[01 Mar 25 07:11 UTC] No.43216746{3}[source]▶

>>43207352 #

Tons of vendors offer cloud first, distributed deployments. Erlang is distributed by default. Spark is distributed by default. Most databases are distributed by default.

replies(1): >>43228529 #

↑