Most active commenters

rbanffy(3)

The AMD Radeon Instinct MI300A's Giant Memory Subsystem

(chipsandcheese.com)

Show context

btown ◴[18 Jan 25 15:12 UTC] No.42748940[source]▶

I've often thought that one of the places AMD could distinguish itself from NVIDIA is bringing significantly higher amounts of VRAM (or memory systems that are as performant as what we currently know as VRAM) to the consumer space.

A card with a fraction of the FLOPS of cutting-edge graphics cards (and ideally proportionally less power consumption), but with 64-128GB VRAM-equivalent, would be a gamechanger for letting people experiment with large multi-modal models, and seriously incentivize researchers to build the next generation of tensor abstraction libraries for both CUDA and ROCm/HIP. And for gaming, you could break new grounds on high-resolution textures. AMD would be back in the game.

Of course, if it's not real VRAM, it needs to be at least somewhat close on the latency and bandwidth front, so let's pop on over and see what's happening in this article...

> An Infinity Cache hit has a load-to-use latency of over 140 ns. Even DRAM on the AMD Ryzen 9 7950X3D shows less latency. Missing Infinity Cache of course drives latency up even higher, to a staggering 227 ns. HBM stands for High Bandwidth Memory, not low latency memory, and it shows.

Welp. Guess my wish isn't coming true today.

replies(10): >>42749016 #>>42749039 #>>42749048 #>>42749096 #>>42749201 #>>42749629 #>>42749785 #>>42749805 #>>42752432 #>>42752946 #

1. mpercival531 ◴[18 Jan 25 15:30 UTC] No.42749048[source]▶

>>42748940 #

They are. Strix Halo is going after that same space of Apple M4 Pro/Max where it is currently unchallenged. Pairing it with two 64GB LPCAMM2 modules will get you there.

Edit: The problem with AMD is less the hardware offerings, but more that their compute software stack historically tends to handwave or be very slow with consumer GPU support — even more so with their APUs. Maybe the advent of MI300A will change the equation, maybe not.

replies(2): >>42749929 #>>42752317 #

2. lhl ◴[18 Jan 25 17:40 UTC] No.42749929[source]▶

>>42749048 (TP) #

I don't know of any non-soldered memory Strix Halo devices, but both HP and Asus have announced 128GB SKUs (availability unknown).

For LLM inference, basically everything works w/ ROCm on RDNA3 now (well, Flash Attention is via Triton and doesn't have support for SWA and some other stuff; also I mostly test on Linux, although I did check that the new WSL2 support works). I've tested some older APUs w/ basic benchmarking as well. Notes here for those interested: https://llm-tracker.info/howto/AMD-GPUs

replies(1): >>42750062 #

3. UncleOxidant ◴[18 Jan 25 18:00 UTC] No.42750062[source]▶

>>42749929 #

Thanks for that link. I'm interested in either getting the HP Mini Z1 G1a or an NVidia Digits for LLM experimentation. The obvious advantage for the Digits is the CUDA ecosystem is much more tried & true for that kind of thing. But the disadvantage is trying to use it as a replacement for my current PC as well as the fact that it's going to run an already old version of Ubuntu (22.04) and you're dependent on Nvidia for updates.

replies(2): >>42750176 #>>42750988 #

4. lhl ◴[18 Jan 25 18:15 UTC] No.42750176{3}[source]▶

>>42750062 #

Yeah, I think anyone w/ old Jetsons knows what it's like to be left high and dry by Nvidia's embedded software support. Older models are basically just ewaste. Since the Digits won't be out until May, I guess there's enough time to wait and see - at least to get a sense of what the actual specs are. I have a feeling the FP16 TFLOPS and the MBW are going to be much lower than what people have been hyping themselves up for.

Sadly, my feeling is that the big Strix Halo SKUs (which have no scheduled release dates) aren't going to be competitively priced (they're likely to be at a big FLOPS/real-world performance disadvantage, and there's still the PITA factor), but there is something appealing about about the do-it-all aspect of it.

replies(1): >>42751178 #

5. KeplerBoy ◴[18 Jan 25 20:02 UTC] No.42750988{3}[source]▶

>>42750062 #

Who said anything about Ubuntu 22.04? I mean sure that's the newest release current jetpack comes with, but I'd be surprised if they shipped digits with that.

replies(1): >>42751155 #

6. rbanffy ◴[18 Jan 25 20:26 UTC] No.42751155{4}[source]▶

>>42750988 #

Doesn’t DGX OS use the latest LTS version? Current should be 24.04.

replies(1): >>42751395 #

7. rbanffy ◴[18 Jan 25 20:29 UTC] No.42751178{4}[source]▶

>>42750176 #

DIGITS looks like a serious attempt, but they don’t have too much of an incentive to have people developing for older hardware. I wouldn’t expect them to supor it for more than five years. At least the underlying Ubuntu will last more than that and provide a viable work environment far beyond the time it gets really boring.

replies(1): >>42751801 #

8. KeplerBoy ◴[18 Jan 25 21:01 UTC] No.42751395{5}[source]▶

>>42751155 #

I wouldn't know. I only work with workstation or jetson stuff.

The DGX documentation and downloads aren't public afaik.

Edit: Nevermind, some information about DGX is public and they really are on 22.04, but oh well, the deep learning stack is guaranteed to run.

https://docs.nvidia.com/base-os/too

9. UncleOxidant ◴[18 Jan 25 22:07 UTC] No.42751801{5}[source]▶

>>42751178 #

If only they could get their changes upstreamed to Ubuntu (and possible kernel mods upstreamed), then we wouldn't have to worry about it.

replies(1): >>42751874 #

10. rbanffy ◴[18 Jan 25 22:19 UTC] No.42751874{6}[source]▶

>>42751801 #

Getting their kernel mods upstreamed is very unlikely, but they might provide just enough you can build a new kernel with the same major version number.

11. Dylan16807 ◴[18 Jan 25 23:37 UTC] No.42752317[source]▶

>>42749048 (TP) #

> Pairing it with two 64GB LPCAMM2 modules will get you there.

It gets you closer for sure. But while ~250GB/s is a whole lot better than SO-DIMMs at ~100GB/s, the new mid-tier GPUs are probably more like 640-900GB/s.

↑