←back to thread

151 points ibobev | 1 comments | | HN request time: 0s | source
Show context
jandrewrogers ◴[] No.45660889[source]
I've worked on several thread-per-core systems that were purpose-built for extreme dynamic data and load skew. They work beautifully at very high scales on the largest hardware. The mechanics of how you design thread-per-core systems that provide uniform distribution of load without work-stealing or high-touch thread coordination have idiomatic architectures at this point. People have been putting thread-per-core architectures in production for 15+ years now and the designs have evolved dramatically.

The architectures from circa 2010 were a bit rough. While the article has some validity for architectures from 10+ years ago, the state-of-the-art for thread-per-core today looks nothing like those architectures and largely doesn't have the issues raised.

News of thread-per-core's demise has been greatly exaggerated. The benefits have measurably increased in practice as the hardware has evolved, especially for ultra-scale data infrastructure.

replies(3): >>45661411 #>>45661630 #>>45667563 #
FridgeSeal ◴[] No.45661630[source]
Are there any resources/learning material about the more modern thread-per-core approaches? It’s a particular area of interest for me, but I’ve had relatively little success finding more learning material, so I assume there’s lots of tightly guarded institutional knowledge.
replies(2): >>45664476 #>>45668196 #
jandrewrogers ◴[] No.45664476[source]
Unfortunately, not really. I worked in HPC when it was developed as a concept there, which is where I learned it. I brought it over into databases which was my primary area of expertise because I saw the obvious cross-over application to some scaling challenges in databases. Over time, other people have adopted the ideas but a lot of database R&D is never published.

Writing a series of articles about the history and theory of thread-per-core software architecture has been on my eternal TODO list. HPC in particular is famously an area of software that does a lot of interesting research but rarely publishes, in part due to its historical national security ties.

The original thought exercise was “what if we treated every core like a node in a supercomputing cluster” because classical multithreading was scaling poorly on early multi-core systems once the core count was 8+. The difference is that some things are much cheaper to move between cores than an HPC cluster and so you adapt the architecture to leverage the things that are cheap that you would never do on a cluster while still keeping the abstraction of a cluster.

As an example, while moving work across cores is relatively expensive (e.g. work stealing), moving data across cores is relatively cheap and low-contention. The design problem then becomes how to make moving data between cores maximally cheap, especially given modern hardware. It turns out that all of these things have elegant solutions in most cases.

There isn’t a one-size-fits-all architecture but you can arrive at architectures that have broad applicability. They just don’t look like the architectures you learn at university.

replies(3): >>45666836 #>>45668765 #>>45669891 #
1. packetlost ◴[] No.45669891[source]
I'll toss $20-50 your way to bump up the priority on writing that knowledge down, only strings are it has to actually get done and be publicly available