←back to thread

486 points dbreunig | 1 comments | | HN request time: 0.217s | source
Show context
dmitrygr ◴[] No.41863335[source]
In general MAC unit utilization tends to be low for transformers, but 1.3% seems pretty bad. I wonder if they fucked up the memory interface for the NPU. All the MACs in the world are useless if you cannot feed them.
replies(2): >>41863438 #>>41863595 #
moffkalast ◴[] No.41863438[source]
I recall looking over the Ryzen AI architecture and the NPU is just plugged into PCIe and thus gets completely crap memory bandwidth. I would expect it might be similar here.
replies(2): >>41863770 #>>41864166 #
1. PaulHoule ◴[] No.41863770[source]
I spent a lot of time with a business partner and an expert looking at the design space for accelerators and it was made very clear to me that the memory interface puts a hard limit on what you can do and that it is difficult to make the most of. Particularly if a half-baked product is being rushed out because of FOMO you’d practically expect them to ship something that gives a few percent of the performance because the memory interface doesn’t really work, it happens to the best of them:

https://en.wikipedia.org/wiki/Cell_(processor)