←back to thread

172 points marban | 5 comments | | HN request time: 1.303s | source
Show context
InTheArena ◴[] No.40051885[source]
While everyone has focused on Apple's power-efficiency on the M series chips, one thing that has been very interesting is how powerful the unified memory model (by having the memory on-package with CPU) with large bandwidth to the memory actually is. Hence a lot of people in the local LLMA community are really going after high-memory Macs.

It's great to see NPUs here with the new Ryzen cores - but I wonder how effective they will be with off-die memory versus the Apple approach.

That said, it's nothing but great to see these capabilities in something other then a expensive NVIDIA card. Local NPUs may really help with edge deploying more conferencing capabilities.

Edited - sorry, ,meant on-package.

replies(8): >>40051950 #>>40052032 #>>40052167 #>>40052857 #>>40053126 #>>40054064 #>>40054570 #>>40054743 #
thsksbd ◴[] No.40052857[source]
Old becomes new, the SGI O2 had (off chip) a unified memory model for performance reasons.

Not a CS guy, but it seems to me that NUMA like architecture has to come back. Large RAM on chip (balancing a thermal budget between #ofcores vs ram), a much larger RAM off chip and even more RAM through a fast interconnect on a single kernel image. Like the Origin 300 had.

replies(3): >>40053224 #>>40054054 #>>40055112 #
Rinzler89 ◴[] No.40053224[source]
UMA in the SGI machines (and gaming consoles) made sense because all the memory chips at that time were equally slow, or fast, depending how you wanna look at it.

PC HW split the video memory from system memory once GDDRAM become so much faster than system RAM, but GDDRAM has too high latency for CPUs and DDR has too low bandwidth for GPUs, so the separation made sense for each's strengths and still does to this day. Unifying it again, like with AMD's APUs, means either compromises for the CPU or for the GPU. There's no free lunch.

Currently AMD APUs on the PC use unified DDRAM so CPU performance is top but GPU/NPU perforce is bottlenecked. If they were to use unified GDDRAM like in the PS5/Xbox then GPU/NPU performance would be top and CPU performance would be bottlenecked.

replies(3): >>40053947 #>>40054567 #>>40056075 #
Dalewyn ◴[] No.40053947[source]
>Unifying it again, like with AMD's APUs, means either compromises for the CPU or for the GPU. There's no free lunch.

I think the lunch here (it still ain't free) is that RAM speed means nothing if you don't have enough RAM in the first place, and this is a compromise solution to that practical problem.

replies(2): >>40054000 #>>40055288 #
Dylan16807 ◴[] No.40055288[source]
The real problem is a lack of competition in GPU production.

GDDR is not very expensive. You should be able to get a GPU with a mid-level chip and tons of memory, but it's just not offered. Instead, please pay triple or quadruple the price of a high end gaming GPU to get a model with double the memory and basically the same core.

The level of markup is astonishing. I can go from 8GB to 16GB on AMD for $60, but going from 24GB to 48GB costs $3000. And nvidia isn't better.

replies(4): >>40055629 #>>40055989 #>>40056434 #>>40069867 #
nsteel ◴[] No.40056434[source]
This isn't my area but won't it be quite expensive to use that GDDR? The PHY and the controller are complicated and you've got max 2GB devices, so if you want more memory you need a wider bus. That requires more beachfront and therefore a bigger die. That must make it expensive once you go beyond what you can fit on your small, cheap chip. Do their 24GB+ cards really use GDDR?*

And you need to ensure you don't shoot yourself in the foot by making anything (relatively) cheap that could be useful for AI...

*Edit: wow yeh, they do! A 384-bit interface on some! Sounds hot.

replies(1): >>40059573 #
Dylan16807 ◴[] No.40059573[source]
The upgrade is actually remarkably similar.

RX7600 and RX7600XT have 8GB or 16GB attached to a 128-bit bus.

RX7900XT and W7900 have 24GB or 48GB attached to a 384-bit bus.

Neither upgrade changes the bus width, only the memory chips.

The two upgraded models even use the same memory chips, as far as I can tell! GDDR6, 18000MT/s, 2GB per 16 pins. I couldn't confirm chip count on the W7900 but it's the same density.

replies(1): >>40064467 #
nsteel ◴[] No.40064467[source]
Perhaps I should have been clearer and said "if you want more memory than the 16GB model you need a wider bus" but this confirms what I described above.

  - cheap gfx die with a 128-bit memory interface.
  - vastly more expensive gfx die with a 384-bit memory interface.
Essentially there's a cheap upgrade option available for each die, swapping the 1GB memory chips for 2GB memory chips (GDDR doesn't support a mix). If you have the 16GB model and you want more memory, there are no bigger memory chips available, and so you need a wider bus and that's going to cost significantly more to produce and hence they charge more.

As a side note, I would expect the the GDDR chips to be x32, rather than 16.

replies(1): >>40068306 #
1. Dylan16807 ◴[] No.40068306[source]
> Essentially there's a cheap upgrade option available for each die

Hmm, I think you missed what my actual point was.

You can buy a card with the cheap die for $270.

You can buy a card with the expensive die for $1000.

So far, so good.

The cost of just the memory upgrade for the cheap die is $60.

The cost of just the memory upgrade for the expensive die is $3000.

None of that $3000 is going toward upgrading the bus width. Maybe one percent of it goes toward upgrading the circuit board. It's a huge market segmentation fee.

> As a side note, I would expect the the GDDR chips to be x32, rather than 16.

The pictures I've found show the 7600 using 4 ram chips and the 7600 XT using 8 ram chips.

replies(1): >>40069347 #
2. nsteel ◴[] No.40069347[source]
I did miss your point, I misunderstood the hefty price tag was just for the memory. Thank you for not biting my head off! That is pretty mad. I'm struggling to think of any explanation other than yours. Even a few supporting upgrades maybe required for the extra memory (power supplies etc) wouldn't be anywhere near that cost.

I couldn't find much to actually see how many chips used but on https://www.techpowerup.com/review/sapphire-radeon-rx-7600-x... it says H56G42AS8DX-014 which is a x32 (https://product.skhynix.com/products/dram/gddr/gddr6.go?appT...). But either way it can't explain that pricing!

replies(1): >>40071488 #
3. Dylan16807 ◴[] No.40071488[source]
The first two pictures on that page show 4 RAM chips on each side of the board.

https://media-www.micron.com/-/media/client/global/documents...

This document directly talks about splitting a 32 data bit connection across two GDDR6 chips, on page 7.

"Allows for a doubling of density. Two 8Gb devices appear to the controller as a single, logical 16Gb device with two 16-bite wide channels."

Do that with 16Gb chips and you match the GPUs we're talking about.

replies(1): >>40074037 #
4. nsteel ◴[] No.40074037{3}[source]
Of course - clamshell mode! They don't connect half the data bus from each chip. Also explains how they fit it on the card so easily without (and cheeply).
replies(1): >>40078610 #
5. Dylan16807 ◴[] No.40078610{4}[source]
Yeah, though a way to connect fewer data pins to each chip doesn't particularly have to be clamshell, it just requires a little bit of flexibility in the design.