←back to thread

212 points pella | 2 comments | | HN request time: 0.41s | source
Show context
btown ◴[] No.42748940[source]
I've often thought that one of the places AMD could distinguish itself from NVIDIA is bringing significantly higher amounts of VRAM (or memory systems that are as performant as what we currently know as VRAM) to the consumer space.

A card with a fraction of the FLOPS of cutting-edge graphics cards (and ideally proportionally less power consumption), but with 64-128GB VRAM-equivalent, would be a gamechanger for letting people experiment with large multi-modal models, and seriously incentivize researchers to build the next generation of tensor abstraction libraries for both CUDA and ROCm/HIP. And for gaming, you could break new grounds on high-resolution textures. AMD would be back in the game.

Of course, if it's not real VRAM, it needs to be at least somewhat close on the latency and bandwidth front, so let's pop on over and see what's happening in this article...

> An Infinity Cache hit has a load-to-use latency of over 140 ns. Even DRAM on the AMD Ryzen 9 7950X3D shows less latency. Missing Infinity Cache of course drives latency up even higher, to a staggering 227 ns. HBM stands for High Bandwidth Memory, not low latency memory, and it shows.

Welp. Guess my wish isn't coming true today.

replies(10): >>42749016 #>>42749039 #>>42749048 #>>42749096 #>>42749201 #>>42749629 #>>42749785 #>>42749805 #>>42752432 #>>42752946 #
pkroll ◴[] No.42749039[source]
You're not the only one thinking that: https://www.nvidia.com/en-us/project-digits/

128G of unified memory. $3K. Throw ollama and ComfyUI on that sucker and things could get interesting. The question is how much slower than a 5090, is this gonna be? The memory bandwidth isn't going to match a 512 bit bus.

replies(4): >>42749113 #>>42750477 #>>42750999 #>>42756776 #
1. manojlds ◴[] No.42750477[source]
It's LPDDR5.
replies(1): >>42752524 #
2. ein0p ◴[] No.42752524[source]
That's actually a good thing. That's how you get a ton of DRAM without it costing a fortune. M2 Ultra is able to get GPU-like 800GB/sec with DDR4. From that it follows that if you can design a specialized chip, you can get a respectable 1 TB/sec quite easily with LPDDR5, provided that you're willing to design a chip to support a ton of memory channels (and potentially also a wider memory bus). In fact, I'm baffled that such devices don't already exist outside Apple's product line. Seems like a rather obvious thing to do, and Apple has a "proof of concept" already. I can think of at least four companies off the top of my head that could do it quite easily, besides Apple.