Performance per watt is increasing due to the lithography.
Also, Devon’s paradox.
Traditionally x86 has been built powerful and power hungry and then designers scaled the chips down whereas it's the opposite for ARM.
For whatever reason, this also makes it possible to get much bigger YoY performance gains in ARM. The Apple M4 is a mature design[0] and yet a year later the M5 is CPU +15% GPU +30% memory bandwidth +28%.
The Snapdragon Elite X series is showing a similar trajectory.
So Jim Keller ended up being wrong that ISA doesn't matter. Its just that it's the people in the ISA that matter, not the silicon.
[0] its design traces all the way back to the A12 from 2018, and in some fundamental ways even to the A10 from 2016.
I would need some strong evidence to make me think it isn't the ISA that makes the difference.
Basically, x86 uses op caches and micro ops which reduces instruction decoder use, the decoder itself doesn't use significant power, and ARM also uses op caches and micro ops to improve performance. So there is little effective difference. Micro ops and branch prediction is where the big wins are and both ISAs use them extensively.
If the hardware is equal and the designers are equally skilled, yet one ISA consistently pulls ahead, that leads to the likely conclusion that the way the chips get designed must be different for teams using the winning ISA.
For what it's worth, the same is happening in GPU land. Infamously, the M1 Ultra GPU at 120W equals the performance of the RTX 3090 at 320W (!).
That same M1 also smoked an Intel i9.