Unfortunately, not really. I worked in HPC when it was developed as a concept there, which is where I learned it. I brought it over into databases which was my primary area of expertise because I saw the obvious cross-over application to some scaling challenges in databases. Over time, other people have adopted the ideas but a lot of database R&D is never published.
Writing a series of articles about the history and theory of thread-per-core software architecture has been on my eternal TODO list. HPC in particular is famously an area of software that does a lot of interesting research but rarely publishes, in part due to its historical national security ties.
The original thought exercise was “what if we treated every core like a node in a supercomputing cluster” because classical multithreading was scaling poorly on early multi-core systems once the core count was 8+. The difference is that some things are much cheaper to move between cores than an HPC cluster and so you adapt the architecture to leverage the things that are cheap that you would never do on a cluster while still keeping the abstraction of a cluster.
As an example, while moving work across cores is relatively expensive (e.g. work stealing), moving data across cores is relatively cheap and low-contention. The design problem then becomes how to make moving data between cores maximally cheap, especially given modern hardware. It turns out that all of these things have elegant solutions in most cases.
There isn’t a one-size-fits-all architecture but you can arrive at architectures that have broad applicability. They just don’t look like the architectures you learn at university.