1. Even using sync.RWMutex and specialized policies won't really help you outperform a well-implemented BP-Wrapper in terms of latency/throughput.
2. I've never seen cases where W-TinyLFU loses more than 2-3% hit rate compared to simpler eviction policies. But most simple policies are vulnerable to attacks and can drop your hit rate by dozens of percentage points under workload variations. Even ignoring adversarial workloads, you'd still need to guess which specific policy gives you those extra few percentage points. I question the very premise of this approach.
3. When it comes to loading and refreshing, writing a correct implementation is non-trivial. After implementing it, I'm not sure the cache could still be called "simple". And at the very least, refreshing can reduce end-to-end latency by orders of magnitude.