The death of thread per core

(buttondown.com)

151 points ibobev | 1 comments | 20 Oct 25 21:19 UTC | HN request time: 0.001s | source

Show context

vacuity ◴[21 Oct 25 04:11 UTC] No.45652410[source]▶

There are no hard rules; use principles flexibly.

That being said, there are some things that are generally true for the long term: use a pinned thread per core, maximize locality (of data and code, wherever relevant), use asynchronous programming if performance is necessary. To incorporate the OP, give control where it's due to each entity (here, the scheduler). Cross-core data movement was never the enemy, but unprincipled cross-core data movement can be. If even distribution of work is important, work-stealing is excellent, as long as it's done carefully. Details like how concurrency is implemented (shared-state, here) or who controls the data are specific to the circumstances.

replies(1): >>45661510 #

AaronAPU ◴[21 Oct 25 20:49 UTC] No.45661510[source]▶

>>45652410 #

I did mass scale performance benchmarking on highly optimized workloads using lockfree queues and fibers, and locking to a core almost never was faster. There were a few topologies where it was, but they were outliers.

This was on a wide variety of intel, AMD, NUMA, ARM processors with different architectures, OSes and memory configurations.

Part of the reason is hyper threading (or threadripper type archs) but even locking to groups wasn’t usually faster.

This was even moreso the case when you had competing workloads stealing cores from the OS scheduler.

replies(4): >>45661808 #>>45662327 #>>45663519 #>>45669765 #

1. menaerus ◴[22 Oct 25 14:35 UTC] No.45669765[source]▶

>>45661510 #

What type of workloads?

↑