←back to thread

172 points marban | 1 comments | | HN request time: 0s | source
Show context
InTheArena ◴[] No.40051885[source]
While everyone has focused on Apple's power-efficiency on the M series chips, one thing that has been very interesting is how powerful the unified memory model (by having the memory on-package with CPU) with large bandwidth to the memory actually is. Hence a lot of people in the local LLMA community are really going after high-memory Macs.

It's great to see NPUs here with the new Ryzen cores - but I wonder how effective they will be with off-die memory versus the Apple approach.

That said, it's nothing but great to see these capabilities in something other then a expensive NVIDIA card. Local NPUs may really help with edge deploying more conferencing capabilities.

Edited - sorry, ,meant on-package.

replies(8): >>40051950 #>>40052032 #>>40052167 #>>40052857 #>>40053126 #>>40054064 #>>40054570 #>>40054743 #
thsksbd ◴[] No.40052857[source]
Old becomes new, the SGI O2 had (off chip) a unified memory model for performance reasons.

Not a CS guy, but it seems to me that NUMA like architecture has to come back. Large RAM on chip (balancing a thermal budget between #ofcores vs ram), a much larger RAM off chip and even more RAM through a fast interconnect on a single kernel image. Like the Origin 300 had.

replies(3): >>40053224 #>>40054054 #>>40055112 #
Rinzler89 ◴[] No.40053224[source]
UMA in the SGI machines (and gaming consoles) made sense because all the memory chips at that time were equally slow, or fast, depending how you wanna look at it.

PC HW split the video memory from system memory once GDDRAM become so much faster than system RAM, but GDDRAM has too high latency for CPUs and DDR has too low bandwidth for GPUs, so the separation made sense for each's strengths and still does to this day. Unifying it again, like with AMD's APUs, means either compromises for the CPU or for the GPU. There's no free lunch.

Currently AMD APUs on the PC use unified DDRAM so CPU performance is top but GPU/NPU perforce is bottlenecked. If they were to use unified GDDRAM like in the PS5/Xbox then GPU/NPU performance would be top and CPU performance would be bottlenecked.

replies(3): >>40053947 #>>40054567 #>>40056075 #
Dalewyn ◴[] No.40053947[source]
>Unifying it again, like with AMD's APUs, means either compromises for the CPU or for the GPU. There's no free lunch.

I think the lunch here (it still ain't free) is that RAM speed means nothing if you don't have enough RAM in the first place, and this is a compromise solution to that practical problem.

replies(2): >>40054000 #>>40055288 #
Dylan16807 ◴[] No.40055288[source]
The real problem is a lack of competition in GPU production.

GDDR is not very expensive. You should be able to get a GPU with a mid-level chip and tons of memory, but it's just not offered. Instead, please pay triple or quadruple the price of a high end gaming GPU to get a model with double the memory and basically the same core.

The level of markup is astonishing. I can go from 8GB to 16GB on AMD for $60, but going from 24GB to 48GB costs $3000. And nvidia isn't better.

replies(4): >>40055629 #>>40055989 #>>40056434 #>>40069867 #
paulmd ◴[] No.40069867{4}[source]
what exactly do you expect them to do about the routing? 384b gpus are as big as memory buses get in the modern era and that gives you 24gb capacity or 48gb when clamshelled. Higher-density 3GB modules that would allow 36GB/72GB have been repeatedly pushed back, which has kinda screwed over consoles as well - they are in the same bind with "insufficient" VRAM on MS and while sony is less awful they didn't give a VRAM increase with PS5 Pro either.

GB202 might be going to 512b bus which is unprecedented in the modern era (nobody has done it since the Hawaii/GCN2 days) but that's really as big as anybody cares to route right now. What do you propose for going past that?

Ironically AMD actually does have the capability to experiment and put bigger memory buses on smaller dies. And the MCM packaging actually does give you some physical fanout that makes the routing easier. But again, ultimately there is just no appetite for going to 512b or 768b buses at a design level, for very good technical reasons.

like it just is what it is - GDDR is just not very dense, this is what you can get out of it. Production of HBM-based gpus is constrained by stacking capacity, which is the same reason people can't get enough datacenter cards in the first place.

Higher-density LPDDR does exist, but you still have to route it, and bandwidth goes down quite a bit. And that's the Apple Silicon approach, which solves your problem but unfortunately a lot of people just flatly reject any offering from the Fruit Company for interpersonal reasons.

replies(1): >>40072886 #
1. Dylan16807 ◴[] No.40072886{5}[source]
> what exactly do you expect them to do about the routing? 384b gpus are as big as memory buses get in the modern era and that gives you 24gb capacity or 48gb when clamshelled.

I'm not asking for more than that. I'm asking for that to be available on a mainstream consumer model.

Most of the mid-tier GPUS from AMD and nVidia have 256 bit memory busses. I want 32GB on that bus at a reasonable price. Let's say $150 or $200 more than the 16GB version of the same model.

I appreciate the information about higher densities, though.