←back to thread

486 points dbreunig | 3 comments | | HN request time: 0s | source
Show context
dmitrygr ◴[] No.41863335[source]
In general MAC unit utilization tends to be low for transformers, but 1.3% seems pretty bad. I wonder if they fucked up the memory interface for the NPU. All the MACs in the world are useless if you cannot feed them.
replies(2): >>41863438 #>>41863595 #
1. moffkalast ◴[] No.41863438[source]
I recall looking over the Ryzen AI architecture and the NPU is just plugged into PCIe and thus gets completely crap memory bandwidth. I would expect it might be similar here.
replies(2): >>41863770 #>>41864166 #
2. PaulHoule ◴[] No.41863770[source]
I spent a lot of time with a business partner and an expert looking at the design space for accelerators and it was made very clear to me that the memory interface puts a hard limit on what you can do and that it is difficult to make the most of. Particularly if a half-baked product is being rushed out because of FOMO you’d practically expect them to ship something that gives a few percent of the performance because the memory interface doesn’t really work, it happens to the best of them:

https://en.wikipedia.org/wiki/Cell_(processor)

3. wtallis ◴[] No.41864166[source]
It's unlikely to be literally connected over PCIe when it's on the same chip. It just looks like it's connected over PCIe because that's how you make peripherals discoverable to the OS. The integrated GPU also appears to be connected over PCIe, but obviously has access to far more memory bandwidth.