←back to thread

486 points dbreunig | 4 comments | | HN request time: 0.655s | source
Show context
eightysixfour ◴[] No.41863546[source]
I thought the purpose of these things was not to be fast, but to be able to run small models with very little power usage? I have a newer AMD laptop with an NPU, and my power usage doesn't change using the video effects that supposedly run on it, but goes up when using the nvidia studio effects.

It seems like the NPUs are for very optimized models that do small tasks, like eye contact, background blur, autocorrect models, transcription, and OCR. In particular, on Windows, I assumed they were running the full screen OCR (and maybe embeddings for search) for the rewind feature.

replies(7): >>41863632 #>>41863779 #>>41863821 #>>41863886 #>>41864628 #>>41864828 #>>41869772 #
conradev ◴[] No.41863632[source]
That is my understanding as well: low power and low latency.

You can see this in action when evaluating a CoreML model on a macOS machine. The ANE takes half as long as the GPU which takes half as long as the CPU (actual factors being model dependent)

replies(1): >>41863665 #
1. nickpsecurity ◴[] No.41863665[source]
To take half as long, doesn’t it have to perform twice as fast? Or am I misreading your comment?
replies(2): >>41863726 #>>41865127 #
2. eightysixfour ◴[] No.41863726[source]
No, you can have latency that is independent of compute performance. The CPU/GPU may have other tasks and the work has to wait for the existing threads to finish, or for them to clock up, or have slower memory paths, etc.

If you and I have the same calculator but I'm working on a set of problems and you're not, and we're both asked to do some math, it may take me longer to return it, even though the instantaneous performance of the math is the same.

replies(1): >>41863792 #
3. refulgentis ◴[] No.41863792[source]
In isolation, makes sense.

Wouldn't it be odd for OP to present examples that are the opposite of their claim, just to get us thinking about "well the CPU is busy?"

Curious for their input.

4. conradev ◴[] No.41865127[source]
The GPU is stateful and requires loading shaders and initializing pipelines before doing any work. That is where its latency comes from. It is also extremely power hungry.

The CPU is zero latency to get started, but takes longer because it isn't specialized at any one task and isn't massively parallel, so that is why the CPU takes even longer.

The NPU often has a simpler bytecode to do more complex things like matrix multiplication implemented in hardware, rather than having to instantiate a generic compute kernel on the GPU.