←back to thread

213 points cnst | 1 comments | | HN request time: 0.202s | source
Show context
jazzyjackson ◴[] No.42154393[source]
I sent mine back. I thought the NPU would help with local LLM but there nothing to utilize it yet, lmstudio has it on the roadmap but it was a bit of a letdown. M1 MacBook was 30 times faster at generating tokens.

Happy with my gen 11 x1 carbon (the one before they put the power button on the outside edge like a tablet ?!?)

replies(3): >>42155152 #>>42155356 #>>42158228 #
1. woadwarrior01 ◴[] No.42155356[source]
I just got NPU based LLM inference working locally on Snapdragon X Elite with small (3B and 8B) models, but it’s not quite production ready yet. I know all llama.cpp wrappers claim to have it on their roadmap, but the fact of the matter is that they have no clue about how to implement it.

> M1 MacBook was 30 times faster at generating tokens.

Apples and oranges (pardon the pun). llama.cpp (and in turn LMStudio) use Metal GPU acceleration on Apple Silicon, while they currently only do CPU inference on Snapdragon.

It’s possible to use the Adreno GPU for LLM inference (I demoed this at the Snapdragon Summit), which performs better.