←back to thread

172 points marban | 1 comments | | HN request time: 0s | source
Show context
InTheArena ◴[] No.40051885[source]
While everyone has focused on Apple's power-efficiency on the M series chips, one thing that has been very interesting is how powerful the unified memory model (by having the memory on-package with CPU) with large bandwidth to the memory actually is. Hence a lot of people in the local LLMA community are really going after high-memory Macs.

It's great to see NPUs here with the new Ryzen cores - but I wonder how effective they will be with off-die memory versus the Apple approach.

That said, it's nothing but great to see these capabilities in something other then a expensive NVIDIA card. Local NPUs may really help with edge deploying more conferencing capabilities.

Edited - sorry, ,meant on-package.

replies(8): >>40051950 #>>40052032 #>>40052167 #>>40052857 #>>40053126 #>>40054064 #>>40054570 #>>40054743 #
thsksbd ◴[] No.40052857[source]
Old becomes new, the SGI O2 had (off chip) a unified memory model for performance reasons.

Not a CS guy, but it seems to me that NUMA like architecture has to come back. Large RAM on chip (balancing a thermal budget between #ofcores vs ram), a much larger RAM off chip and even more RAM through a fast interconnect on a single kernel image. Like the Origin 300 had.

replies(3): >>40053224 #>>40054054 #>>40055112 #
Rinzler89 ◴[] No.40053224[source]
UMA in the SGI machines (and gaming consoles) made sense because all the memory chips at that time were equally slow, or fast, depending how you wanna look at it.

PC HW split the video memory from system memory once GDDRAM become so much faster than system RAM, but GDDRAM has too high latency for CPUs and DDR has too low bandwidth for GPUs, so the separation made sense for each's strengths and still does to this day. Unifying it again, like with AMD's APUs, means either compromises for the CPU or for the GPU. There's no free lunch.

Currently AMD APUs on the PC use unified DDRAM so CPU performance is top but GPU/NPU perforce is bottlenecked. If they were to use unified GDDRAM like in the PS5/Xbox then GPU/NPU performance would be top and CPU performance would be bottlenecked.

replies(3): >>40053947 #>>40054567 #>>40056075 #
Dalewyn ◴[] No.40053947[source]
>Unifying it again, like with AMD's APUs, means either compromises for the CPU or for the GPU. There's no free lunch.

I think the lunch here (it still ain't free) is that RAM speed means nothing if you don't have enough RAM in the first place, and this is a compromise solution to that practical problem.

replies(2): >>40054000 #>>40055288 #
Rinzler89 ◴[] No.40054000{3}[source]
>if you don't have enough RAM in the first place

Enough RAM for what task exactly? System RAM is plentiful and cheap nowadays(unless you buy Apple). I got new a laptop with 32GB RAM for about 750 Euros. But the speeds are too low for high-end gamming or LLM training for the poor APU.

replies(1): >>40054101 #
1. numpad0 ◴[] No.40054101{4}[source]
Enough RAM for LLM. There are GPUs faster than M2 Ultra but can't run LLMs normally, which make that speed a moot point for LLM use-cases.