Distributed systems programming has stalled

(www.shadaj.me)

287 points shadaj | 5 comments | 27 Feb 25 16:12 UTC | HN request time: 0.787s | source

Show context

bsnnkv ◴[27 Feb 25 16:52 UTC] No.43196091[source]▶

>>43195702 (OP) #

Last month I switched from a role working on a distributed system (FAANG) to a role working on embedded software which runs on cards in data center racks.

I was in my last role for a year, and 90%+ of my time was spent investigating things that went "missing" at one of many failure points between one of the many distributed components.

I wrote less than 200 lines of code that year and I experienced the highest level of burnout in my professional career.

The technical aspect that contributed the most to this burnout was both the lack of observability tooling and the lack of organizational desire to invest in it. Whenever I would bring up this gap I would be told that we can't spend time/money and wait for people to create "magic tools".

So far the culture in my new embedded (Rust, fwiw) position is the complete opposite. If you're burnt out working on distributed systems and you care about some of the same things that I do, it's worth giving embedded software dev a shot.

replies(24): >>43196122 #>>43196159 #>>43196163 #>>43196180 #>>43196239 #>>43196674 #>>43196899 #>>43196910 #>>43196931 #>>43197177 #>>43197902 #>>43198895 #>>43199169 #>>43199589 #>>43199688 #>>43199980 #>>43200186 #>>43200596 #>>43200725 #>>43200890 #>>43202090 #>>43202165 #>>43205115 #>>43208643 #

1. intelVISA ◴[28 Feb 25 00:21 UTC] No.43200186[source]▶

>>43196091 #

Distributed systems always ends up a dumping ground of failed tech solutions to deep org dysfunction.

Weak tech leadership? Let's "fix" that with some microservices.

Now it's FUBAR? Conceal it with some cloud native horrors, sacrifice a revolving door of 'smart' disempowered engineers to keep the theater going til you can jump to the next target.

Funny because dis sys is pretty solved since Lamport, 40+ years ago.

replies(2): >>43200288 #>>43200321 #

2. rbjorklin ◴[28 Feb 25 00:36 UTC] No.43200288[source]▶

>>43200186 (TP) #

Would you mind sharing some more specific information/references to Lamport’s work?

replies(2): >>43200329 #>>43200392 #

3. whstl ◴[28 Feb 25 00:42 UTC] No.43200321[source]▶

>>43200186 (TP) #

I suffered through this in two companies and man, it isn't easy.

First one was a multi-billion-Unicorn had everything converted to microservices, with everything customized in Kubernetes. One day I even had to fix a few bugs in the service mesh because the guy who wrote it left and I was the only person not fighting fires able to write the language it was in. I left right after the backend-of-the-frontend failed to sustain traffic during a month where they literally had zero customers (Corona).

At the second one there was a mandate to rewrite everything to microservices and it took another team 5 months to migrate a single 100-line class I wrote into a microservice. It just wasn't meant to be. Then the only guy who knows how the infrastructure works got burnout after being yelled at too many times and then got demoted, and last I heard is at home with depression.

Weak leadership doesn't even begin to describe it, especially the second.

But remembering it is a nice reminder that a job is just a means of getting a payment.

4. vitus ◴[28 Feb 25 00:44 UTC] No.43200329[source]▶

>>43200288 #

The three big papers: clocks [0], Paxos [1], Byzantine generals [2].

[0] https://lamport.azurewebsites.net/pubs/time-clocks.pdf

[1] https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf

[2] https://lamport.azurewebsites.net/pubs/byz.pdf

Or, if you prefer wiki articles:

https://en.wikipedia.org/wiki/Lamport_timestamp

https://en.wikipedia.org/wiki/Paxos_(computer_science)

https://en.wikipedia.org/wiki/Byzantine_fault

I don't know that I would call it "solved", but he certainly contributed a huge amount to the field.

5. madhadron ◴[28 Feb 25 00:53 UTC] No.43200392[source]▶

>>43200288 #

Lamport's website has his collected works. The paper to start with is "Time, clocks, and the ordering of events in a distributed system." Read it closely all the way to the end. Everyone seems to miss the last couple sections for some reason.

↑