AI PCs Aren't Good at AI: The CPU Beats the NPU

(github.com)

488 points dbreunig | 4 comments | 16 Oct 24 19:44 UTC | HN request time: 0.77s | source

Show context

eightysixfour ◴[16 Oct 24 20:32 UTC] No.41863546[source]▶

I thought the purpose of these things was not to be fast, but to be able to run small models with very little power usage? I have a newer AMD laptop with an NPU, and my power usage doesn't change using the video effects that supposedly run on it, but goes up when using the nvidia studio effects.

It seems like the NPUs are for very optimized models that do small tasks, like eye contact, background blur, autocorrect models, transcription, and OCR. In particular, on Windows, I assumed they were running the full screen OCR (and maybe embeddings for search) for the rewind feature.

replies(7): >>41863632 #>>41863779 #>>41863821 #>>41863886 #>>41864628 #>>41864828 #>>41869772 #

conradev ◴[16 Oct 24 20:40 UTC] No.41863632[source]▶

>>41863546 #

That is my understanding as well: low power and low latency.

You can see this in action when evaluating a CoreML model on a macOS machine. The ANE takes half as long as the GPU which takes half as long as the CPU (actual factors being model dependent)

replies(1): >>41863665 #

1. nickpsecurity ◴[16 Oct 24 20:44 UTC] No.41863665[source]▶

>>41863632 #

To take half as long, doesn’t it have to perform twice as fast? Or am I misreading your comment?

replies(2): >>41863726 #>>41865127 #

2. eightysixfour ◴[16 Oct 24 20:49 UTC] No.41863726[source]▶

>>41863665 (TP) #

No, you can have latency that is independent of compute performance. The CPU/GPU may have other tasks and the work has to wait for the existing threads to finish, or for them to clock up, or have slower memory paths, etc.

If you and I have the same calculator but I'm working on a set of problems and you're not, and we're both asked to do some math, it may take me longer to return it, even though the instantaneous performance of the math is the same.

replies(1): >>41863792 #

3. refulgentis ◴[16 Oct 24 20:57 UTC] No.41863792[source]▶

>>41863726 #

In isolation, makes sense.

Wouldn't it be odd for OP to present examples that are the opposite of their claim, just to get us thinking about "well the CPU is busy?"

Curious for their input.

4. conradev ◴[16 Oct 24 23:59 UTC] No.41865127[source]▶

>>41863665 (TP) #

The GPU is stateful and requires loading shaders and initializing pipelines before doing any work. That is where its latency comes from. It is also extremely power hungry.

The CPU is zero latency to get started, but takes longer because it isn't specialized at any one task and isn't massively parallel, so that is why the CPU takes even longer.

The NPU often has a simpler bytecode to do more complex things like matrix multiplication implemented in hardware, rather than having to instantiate a generic compute kernel on the GPU.

↑