In general MAC unit utilization tends to be low for transformers, but 1.3% seems pretty bad. I wonder if they fucked up the memory interface for the NPU. All the MACs in the world are useless if you cannot feed them.
replies(2):
So, OK, yeah, I concede that the NPU may have even worse access to memory than the CPU, but the bottom line is that neither one of them has anything close to what it needs to to actually delivering anything like the marketing headline performance number on any realistic workload.
I bet a lot of people have bought those things after seeing "45 TOPS", thinking that they'd be able to usefully run transformers the size of main memory, and that's not happening on CPU or NPU.