←back to thread

172 points marban | 3 comments | | HN request time: 0.46s | source
Show context
InTheArena ◴[] No.40051885[source]
While everyone has focused on Apple's power-efficiency on the M series chips, one thing that has been very interesting is how powerful the unified memory model (by having the memory on-package with CPU) with large bandwidth to the memory actually is. Hence a lot of people in the local LLMA community are really going after high-memory Macs.

It's great to see NPUs here with the new Ryzen cores - but I wonder how effective they will be with off-die memory versus the Apple approach.

That said, it's nothing but great to see these capabilities in something other then a expensive NVIDIA card. Local NPUs may really help with edge deploying more conferencing capabilities.

Edited - sorry, ,meant on-package.

replies(8): >>40051950 #>>40052032 #>>40052167 #>>40052857 #>>40053126 #>>40054064 #>>40054570 #>>40054743 #
AceJohnny2 ◴[] No.40054743[source]
> unified memory model (by having the memory on-package with CPU)

That's not what "unified memory model" means.

It means that the CPU and GPU (and ANE!) have access to the same banks of memory, unlike PC GPUs that have their own memory, separated from the CPU's by the PCIe bottleneck (as fast as that is, it's still smaller than direct shared DRAM access).

It allows the hardware more flexibility in how the single pool of memory is allocated across devices, and faster sharing of data across devices. (throughput/latency depends on the internal system bus ports and how many each device have access to)

The Apple M-Series chips also has the memory on-package with the CPU (technically SoC, "System-on-Chip"), but that provides different benefits.

replies(2): >>40055425 #>>40056022 #
1. cmovq ◴[] No.40055425[source]
Having separated GPU memory also has its benefits. Once the data makes it through the PCIe bus, graphics memory typically has much higher bandwidth which also doesn’t need to split with the CPU.
replies(2): >>40056425 #>>40062642 #
2. crawshaw ◴[] No.40056425[source]
An M2 Ultra has 800GB/s of memory bandwidth, an Nvidia 4090 has 1008GB/s. Apple have chosen to use relatively little system memory at unusually high bandwidth.
3. mmaniac ◴[] No.40062642[source]
The benefit when having an ecosystem of discrete GPUs is that CPUs can get away with having low bandwidth memory. This is great if you want motherboards with socketed CPU and socketed RAM which are compatible with the whole range of product segments.

CPUs don't really care about memory bandwidth until you get to extreme core counts (Threadripper/Xeon territory). Mainstream desktop and laptop CPUs are fine with just two channels of reasonably fast memory.

This would bottleneck an iGP, but those are always weak anyway. The PC market has told users who need more to get a discrete GPU and to pay the extra costs involved with high bandwidth soldered memory only if they need it.

The calculation Apple has made is different. You'll get exactly what you need as a complete package. You get CPU, GPU, and the bandwidth you need to feed both as a single integrated SoC all the way to the high end. Modularity is something PC users love but doing away with it does have advantages for integration.