←back to thread

486 points dbreunig | 1 comments | | HN request time: 0.23s | source
Show context
eightysixfour ◴[] No.41863546[source]
I thought the purpose of these things was not to be fast, but to be able to run small models with very little power usage? I have a newer AMD laptop with an NPU, and my power usage doesn't change using the video effects that supposedly run on it, but goes up when using the nvidia studio effects.

It seems like the NPUs are for very optimized models that do small tasks, like eye contact, background blur, autocorrect models, transcription, and OCR. In particular, on Windows, I assumed they were running the full screen OCR (and maybe embeddings for search) for the rewind feature.

replies(7): >>41863632 #>>41863779 #>>41863821 #>>41863886 #>>41864628 #>>41864828 #>>41869772 #
conradev ◴[] No.41863632[source]
That is my understanding as well: low power and low latency.

You can see this in action when evaluating a CoreML model on a macOS machine. The ANE takes half as long as the GPU which takes half as long as the CPU (actual factors being model dependent)

replies(1): >>41863665 #
nickpsecurity ◴[] No.41863665[source]
To take half as long, doesn’t it have to perform twice as fast? Or am I misreading your comment?
replies(2): >>41863726 #>>41865127 #
1. conradev ◴[] No.41865127[source]
The GPU is stateful and requires loading shaders and initializing pipelines before doing any work. That is where its latency comes from. It is also extremely power hungry.

The CPU is zero latency to get started, but takes longer because it isn't specialized at any one task and isn't massively parallel, so that is why the CPU takes even longer.

The NPU often has a simpler bytecode to do more complex things like matrix multiplication implemented in hardware, rather than having to instantiate a generic compute kernel on the GPU.