←back to thread

486 points dbreunig | 3 comments | | HN request time: 0.625s | source
Show context
jsheard ◴[] No.41863390[source]
These NPUs are tying up a substantial amount of silicon area so it would be a real shame if they end up not being used for much. I can't find a die analysis of the Snapdragon X which isolates the NPU specifically but AMDs equivalent with the same ~50 TOPS performance target can be seen here, and takes up about as much area as three high performance CPU cores:

https://www.techpowerup.com/325035/amd-strix-point-silicon-p...

replies(4): >>41863880 #>>41863905 #>>41864412 #>>41865466 #
Kon-Peki ◴[] No.41863880[source]
Modern chips have to dedicate a certain percentage of the die to dark silicon [1] (or else they melt/throttle to uselessness), and these kinds of components count towards that amount. So the point of these components is to be used, but not to be used too much.

Instead of an NPU, they could have used those transistors and die space for any number of things. But they wouldn't have put additional high performance CPU cores there - that would increase the power density too much and cause thermal issues that can only be solved with permanent throttling.

[1] https://en.wikipedia.org/wiki/Dark_silicon

replies(2): >>41864171 #>>41865813 #
jcgrillo ◴[] No.41865813[source]
Question--what's to be lost by making your features sufficiently not dense to allow them to cool at full tilt?
replies(2): >>41865917 #>>41866644 #
1. AlotOfReading ◴[] No.41865917[source]
Messes with timing, among other things. A lot of those structures are relatively fixed blocks that are designed for specific sizes. Signals take more time to propagate longer distances, and longer conductors have worse properties. Dense and hot is faster and more broadly useful.
replies(1): >>41866001 #
2. jcgrillo ◴[] No.41866001[source]
Interesting, so does that mean we're basically out of runway without aggressive cooling?
replies(1): >>41867074 #
3. joha4270 ◴[] No.41867074[source]
No.

Every successive semiconductor node uses less power than the previous per transistor at the same clock speed. Its just that we then immediately use this headroom to pack more transistors closer and run them faster, so every chip keeps running into power limits, even if they continually do more with said power.