That's what the Eytzinger ordering is for. Using 1-based indexing like the article, a single lookup will only hit the following nodes:
1
2 or 3 (let's assume 2)
4 or 5 (let's assume 5)
10 or 11
Remember these are all adjacent. So the nodes near the root stay in cache (regardless of whether a path through them in particular was taken), and the other nodes are in cache if you've recently looked up a similar key.
There won't be any further improvement from using a B-tree, which only scatters the memory further. (if anything, you might consider using a higher-base Eytzinger tree for SIMD reasons, but I've never actually seen anyone do this)