←back to thread

577 points simonw | 1 comments | | HN request time: 0.212s | source
1. jauntywundrkind ◴[] No.44730540[source]
MLX does have decent/good software support among ML stacks. Targeting both iOS and mac is a big win in itself.

I wonder what's possible, what the software situation is today with the PC NPU's. AMD's XDNA has been around for a while, XDNA2 jumps from 10->40 TOps. AMD iGPU can access huge memory: is it similar here? The "AMDXDNA" driver merged in 6.14 last winter: where are we now?

But not seeing any evidence that there's popular support in any of the main frameworks. https://github.com/ggml-org/llama.cpp/issues/1499 https://github.com/ollama/ollama/issues/5186

Good news, AMD has an initial implementation of llama.cpp. I don't particularly know what it means, but the firt gen supports W4ABF16 quantization, newer chips support W8A16. https://github.com/ggml-org/llama.cpp/issues/14377 . I'm not sure what it's good for, but there is a Linux "xdna-driveR", https://github.com/amd/xdna-driver . IREE has an experimental backend: https://github.com/nod-ai/iree-amd-aie

There's a lot of other folks also starting on their NPU journeys. ARM's Ethos, and Rockchip's RKNN recently shipped Linux kernel drivers, but it feels like that's just a start? https://www.phoronix.com/news/Arm-Ethos-NPU-Accel-Driver https://www.phoronix.com/news/Rockchip-NPU-Driver-RKNN-2025