←back to thread

149 points whack | 1 comments | | HN request time: 0s | source
Show context
QuaternionsBhop ◴[] No.45780916[source]
Since the CPU is doing cache coherency transparently, perhaps there should be some sort of way to promise that an application is well-behaved in order to access a lower-level non-transparent instruction set to manually manage the cache coherency from the application level. Or perhaps applications can never be trusted with that level of control over the hardware. The MESI model reminded me of Rust's ownership and borrowing. The pattern also appears in OpenGL vs Vulkan drivers, implicit sync vs explicit sync. Yet another example would be the cache management work involved in squeezing out maximum throughput CUDA on an enterprise GPU.
replies(1): >>45781133 #
1. cpgxiii ◴[] No.45781133[source]
There are some knobs that newer processors give for cache control, mostly to partition or reserve cache space to improve security or reduce cache contention between processes.

Actual manual cache management is way too much of an implementation detail for a general-purpose CPU to expose; doing so would deeply tie code to a specific set of processor behavior. Cache sizes and even hierarchies change often between processor generations, and some internal cache behavior has changed within a generation as a result of microcode and/or hardware steppings. Actual cache control would be like MIPS exposing delay slots but so much worse (at least older delay slots really only turn into performance issues, older cache control would easily turn into correctness issues).

Really the only way to make this work is for the final compilation/"specialization" step to occur on the specific device in question, like with a processor using binary translation (e.g. Transmeta, Nvidia Denver) or specialization (e.g. Mill) or a system that effectively enforces runtime compilation (e.g. runtime shader/program compilation in OpenGL and OpenCL).