That being said, there are some things that are generally true for the long term: use a pinned thread per core, maximize locality (of data and code, wherever relevant), use asynchronous programming if performance is necessary. To incorporate the OP, give control where it's due to each entity (here, the scheduler). Cross-core data movement was never the enemy, but unprincipled cross-core data movement can be. If even distribution of work is important, work-stealing is excellent, as long as it's done carefully. Details like how concurrency is implemented (shared-state, here) or who controls the data are specific to the circumstances.