Distributed systems programming has stalled

1. rectang ◴[27 Feb 25 16:56 UTC] No.43196141[source]▶

>>43195702 (OP) #

Ten years ago, I had lunch with Patricia Shanahan, who worked for Sun on multi-core CPUs several decades ago (before taking a post-career turn volunteering at the ASF which is where I met her). There was a striking similarity between the problems that Sun had been concerned with back then and the problems of the distributed systems that power so much the world today.

Some time has passed since then — and yet, most people still develop software using sequential programming models, thinking about concurrency occasionally.

It is a durable paradigm. There has been no revolution of the sort that the author of this post yearns for. If "Distributed Systems Programming Has Stalled", it stalled a long time ago, and perhaps for good reasons.

replies(5): >>43196213 #>>43196377 #>>43196635 #>>43197344 #>>43197661 #

2. shadaj ◴[27 Feb 25 17:04 UTC] No.43196213[source]▶

>>43196141 (TP) #

Stay tuned for the next blog post for one potential answer :) My PhD has been focused on this gap!

replies(1): >>43196347 #

3. rectang ◴[27 Feb 25 17:17 UTC] No.43196347[source]▶

>>43196213 #

As a programmer, I hope that your answer continues to abstract away the problems of concurrency from me, the way that CPU designers have managed, so that I can still think sequentially except when I need to. (And as a senior engineer, you need to — developing reliable concurrent systems is like pilots landing planes in bad weather, part of the job.)

replies(1): >>43196896 #

4. EtCepeyd ◴[27 Feb 25 17:20 UTC] No.43196377[source]▶

>>43196141 (TP) #

> and perhaps for good reasons

For the very good reason that the underlying math is insanely complicated and tiresome for mere practitioners (which, although I have a background in math, I openly aim to be).

For example, even if you assume sequential consistency (which is an expensive assumption) in a C or C++ language multi-threaded program, reasoning about the program isn't easy. And once you consider barriers, atomics, load-acqire/store-release explicitly, the "SMP" (shared memory) proposition falls apart, and you can't avoid programming for a message passing system, with independent actors -- be those separate networked servers, or separate CPUs on a board. I claim that struggling with async messaging between independent peers as a baseline is not why most people get interested in programming.

Our systems (= normal motherboards on one and, and networked peer to peer systems on the other end) have become so concurrent that doing nearly anything efficiently nowadays requires us to think about messaging between peers, and that's very-very foreign to our traditional, sequential, imperative programming languages. (It's also foreign to how most of us think.)

Thus, I certainly don't want a simple (but leaky) software / programming abstraction that hides the underlying hardware complexity; instead, I want the hardware to be simple (as little internally-distributed as possible), so that the simplicity of the (sequential, imperative) programming language then reflect and match the hardware well. I think this can only be found in embedded nowadays (if at all), which is why I think many are drawn to embedded recently.

replies(4): >>43196464 #>>43196786 #>>43197684 #>>43199865 #

5. gmadsen ◴[27 Feb 25 17:29 UTC] No.43196464[source]▶

>>43196377 #

I know c++ has a lack luster implementation, but do coroutines and channels solve some of these complaints? although not inherently multithreaded, many things shouldn't be multithreaded , just paused. and channels insteaded of shared memory can control order

replies(2): >>43196525 #>>43196850 #

6. EtCepeyd ◴[27 Feb 25 17:35 UTC] No.43196525{3}[source]▶

>>43196464 #

I've found both explicit future/promise management and coroutines difficult (even irritating) to reason about. Co-routines look simpler at the surface (than explicit future chaining), and so their the syntax is less atrocious, but there are nasty traps. For example:

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines...

7. hinkley ◴[27 Feb 25 17:48 UTC] No.43196635[source]▶

>>43196141 (TP) #

I think the underlying premise of Cloud is:

Pay a 100% premium on compute resources in order to pretend the 8 Fallacies of Distributed Computing don’t exist.

I sat out the beginning of Cloud and was shocked at how completely absent they are from conversations within the space. When the hangover hits it’ll be ugly. The Devil always gets his due.

8. hinkley ◴[27 Feb 25 18:03 UTC] No.43196786[source]▶

>>43196377 #

I think SaaS and multicore hardware are evolving together because a queue of unrelated, partially ordered tasks running in parallel is a hell of a lot easier to think about than trying to leverage 6-128 cores to keep from ending up with a single user process that’s wasting 84-99% of available resources. Most people are not equipped to contend with Amdahl’s Law. Carving 5% out of the sequential part of a calculation is quickly becoming more time efficient than taking 50% out of the parallel parts, and we’ve spent 40 years beating the urge to reach for 1-4% improvements out of people. When people find out I got a 30% improvement by doing 8+6+4+4+3+2+1.5+1.5 they quickly find someplace else to be. The person who did the compressed pointer work on v8 to make it as fast as 64 bit pointers is the only other person in over a decade I’ve seen document working this way. If you’re reading this we should do lunch.

So because we discovered a lucrative, embarrassingly parallel problem domain that’s what basically the entire industry has been doing for 15 years, since multicore became unavoidable. We have web services and compilers being multi-core and not a lot in between. How many video games still run like three threads and each of those for completely distinct tasks?

replies(2): >>43207818 #>>43208672 #

9. hinkley ◴[27 Feb 25 18:11 UTC] No.43196850{3}[source]▶

>>43196464 #

Coroutines basically make the same observation as transmit windows in TCP/IP: you don’t send data as fast as you can if the other end can’t process it, but also if you send one at a time then you’re going to be twiddling your fingers an awful lot. So you send ten, or twenty, and you wait for signs of progress before you send more.

On coroutines it’s not the network but the L1 cache. You’re better off running a function a dozen times and then running another than running each in turn.

replies(1): >>43199563 #

10. hinkley ◴[27 Feb 25 18:16 UTC] No.43196896{3}[source]▶

>>43196347 #

I was doing some Java code recently after spending a decade in async code and boy that first few minutes was like jumping into a cold pool. Took me a moment to switch gears back to everything is blocking and that function just takes 500ms sometimes, waiting for IO.

11. bigmutant ◴[27 Feb 25 19:02 UTC] No.43197344[source]▶

>>43196141 (TP) #

The fundamental problems are communication lag and lack of information about why issues occur (encapsulated by the Byzantine Generals problem). I like to imagine trying to build a fault-tolerant, reliable system for the Solar System. Would the techniques we use today (retries, timeouts, etc) really be adequate given that lag is upwards of hours instead of milliseconds? But that's the crux of these systems, coordination (mostly) works because systems are close together (same board, at most same DC)

12. jimbokun ◴[27 Feb 25 19:37 UTC] No.43197661[source]▶

>>43196141 (TP) #

The author critiques having sequential code executing on individual nodes, uninformed by the larger distributed algorithm in which they play a part.

However, I think there are great advantages to that style. It’s easier to analyze and test the sequential code for correctness. Then it writes a Kafka message or makes an HTTP call and doesn’t need to be concerned with whatever is handling the next step in the process.

Then assembling the sequential components once they are all working individually is a much simpler task.

13. cmrdporcupine ◴[27 Feb 25 19:40 UTC] No.43197684[source]▶

>>43196377 #

What we need is for formal verification tools (for linearizability, etc.) to be far more understood and common.

14. gmadsen ◴[27 Feb 25 23:04 UTC] No.43199563{4}[source]▶

>>43196850 #

fair enough, that was the design choice c++ went with to not break ABI and have moveable coroutine handles

rust accepted the tradeoff and can do pure stack async,

there are things you can do in c++ to not get the dynamic allocation to heap, but it requires a custom allocator + predefining size of coroutines.

https://pigweed.dev/docs/blog/05-coroutines.html

15. vacuity ◴[27 Feb 25 23:41 UTC] No.43199865[source]▶

>>43196377 #

I think trying to shoehorn everything into sequential, imperative code is a mistake. The burden of performance should be on the programmer's cognitive load, aided where possible by the computer. Hardware should indeed be simple, but not molded to current assumptions. It's indeed true that concurrency of various fashions and the attempts at standardizing it are taxing on programmers. However, I posit this is largely essential complexity and we should accept that big problems deserve focus and commitment. People malign frameworks and standards (obligatory https://xckd.com/927), but the answer is not shying away from them but rather leveraging them while being flexible.

16. linkregister ◴[28 Feb 25 17:00 UTC] No.43207818{3}[source]▶

>>43196786 #

> 8+6+4+4+3+2+1.5+1.5

What is this referring to? It sounds like a fascinating problem.

replies(1): >>43209618 #

17. gue5t ◴[28 Feb 25 18:17 UTC] No.43208672{3}[source]▶

>>43196786 #

Personally I've been inspired by nnethercote's logs (https://nnethercote.github.io/) of incremental single-digit percentage performance improvements to rustc over the past several years. The serial portion of compilers is still quite significant and efforts to e.g. parallelize the entire rustc frontend are heroic slogs that have run into subtle semantic problems (deadlocks and races) that have made it very hard to land them. Not to disparage those working on that approach, but it is really difficult! Meanwhile, dozens of small speedups accumulate to really significant performance improvements over time.

18. EtCepeyd ◴[28 Feb 25 19:55 UTC] No.43209618{4}[source]▶

>>43207818 #

>> When people find out I got a 30% improvement by doing 8+6+4+4+3+2+1.5+1.5

> What is this referring to?

30 = 8+6+4+4+3+2+1.5+1.5