Deploying a model on an NPU requires significant profile based optimization. Picking up a model that works fine on the CPU but hasn't been optimized for an NPU usually leads to disappointing results.
Yeah whenever I’ve spoken to people who work on stuff like IREE or OpenXLA they gave me the impression that understanding how to use those compilers/runtimes is an entire job.