AI PCs Aren't Good at AI: The CPU Beats the NPU

Haven't played much with Qualcomm NPU but Apple Neural Engine available in iOS and MacOS for many Computer Vision models was significantly faster than when running on CPU or GPU (e.g. mediapipe models, yolo, depth-anything) - to the point that inference was much faster on Macbook M2 Max using its NPU that is the same as on older iPhones rather than executing on all 38 GPU cores.

This all depends on model architecture, conversions and tuning. Apple provides good tooling in XCode for benchmarking models up to execution time of single operators and where such operator got executed (CPU, GPU, NPU) in case couldn't been executed on NPU and have to fallback to CPU/GPU. Sometimes model have be tweaked to slightly different operator if it's not available in NPU. On top of that ML frameworks/runtimes such as ONNX/Pytorch/TensorflowLite sometimes don't implement all operators in CoreML or MPS.