A quick search into it shows that this Ryzen AI NPU's support isn't integrated into upstream inference frameworks yet — so right now it's just useless silicon surface you pay for :-/
replies(3):
But regardless, 16 TOPs is no good for LLMs. Though there is a Ryzen AI demo that shows Llama 7B running on these at 8 tokens/sec. A sub-par experience for a sub-par LLM.