and - as the OP suggests - it works best when the cache is a well-defined abstraction with properties and rules about how it works
just because "caching" is mentioned in a meme doesn't mean it can't be true that it can simplify software
Look at how CPU cache line behaviors radically change the performance of superficially similar algorithms.
Look at how query performance for a database server drops off a cliff the moment the working cache no longer fits in memory.
Hiding complexity can be a simplification, until you exceed the bounds of the simplification and the complexity you hid demands your attention anyway.
There's a long history in computer architecture of cores and accelerators that don't have a cache but instead rely on explicitly programmed local scratchpads. They are universally more difficult to program than general purpose CPUs because of that.