←back to thread

486 points dbreunig | 8 comments | | HN request time: 0.008s | source | bottom
1. dmitrygr ◴[] No.41863335[source]
In general MAC unit utilization tends to be low for transformers, but 1.3% seems pretty bad. I wonder if they fucked up the memory interface for the NPU. All the MACs in the world are useless if you cannot feed them.
replies(2): >>41863438 #>>41863595 #
2. moffkalast ◴[] No.41863438[source]
I recall looking over the Ryzen AI architecture and the NPU is just plugged into PCIe and thus gets completely crap memory bandwidth. I would expect it might be similar here.
replies(2): >>41863770 #>>41864166 #
3. Hizonner ◴[] No.41863595[source]
It's a tablet. It probably has like one DDR channel. It's not so much that they "fucked it up" as that they knowingly built a grossly unbalanced system so they could report a pointless number.
replies(1): >>41863634 #
4. dmitrygr ◴[] No.41863634[source]
Well, no. If the CPU can hit better numbers on the same model then the bandwidth from the DDR IS there. Probably the NPU does not attach to the proper cache level, or just has a very thin pipe to it
replies(1): >>41863711 #
5. Hizonner ◴[] No.41863711{3}[source]
The CPU is only about twice as good as the NPU, though (four times as good on one test). The NPU is being advertised as capable of 45 trillion operations per second, and he's getting 1.3 percent of that.

So, OK, yeah, I concede that the NPU may have even worse access to memory than the CPU, but the bottom line is that neither one of them has anything close to what it needs to to actually delivering anything like the marketing headline performance number on any realistic workload.

I bet a lot of people have bought those things after seeing "45 TOPS", thinking that they'd be able to usefully run transformers the size of main memory, and that's not happening on CPU or NPU.

replies(1): >>41863731 #
6. dmitrygr ◴[] No.41863731{4}[source]
Yup, sad all round. We are in agreement.
7. PaulHoule ◴[] No.41863770[source]
I spent a lot of time with a business partner and an expert looking at the design space for accelerators and it was made very clear to me that the memory interface puts a hard limit on what you can do and that it is difficult to make the most of. Particularly if a half-baked product is being rushed out because of FOMO you’d practically expect them to ship something that gives a few percent of the performance because the memory interface doesn’t really work, it happens to the best of them:

https://en.wikipedia.org/wiki/Cell_(processor)

8. wtallis ◴[] No.41864166[source]
It's unlikely to be literally connected over PCIe when it's on the same chip. It just looks like it's connected over PCIe because that's how you make peripherals discoverable to the OS. The integrated GPU also appears to be connected over PCIe, but obviously has access to far more memory bandwidth.