Popular/hot comments

(damek.github.io)

Show context

elashri ◴[24 Jun 25 14:52 UTC] No.44366911[source]▶

Good article summarizing good chunk of information that people should have some idea about. I just want to comment that the title is a little bit misleading because this is talking about the very choices that NVIDIA follows in developing their GPU archs which is not what always what others do.

For example, the arithmetic intensity break-even point (ridge-point) is very different once you leave the NVIDIA-land. If we take AMD Instinct MI300, it has up to 160 TFLOPS FP32 paired with ~6 TB/s of HBM3/3E bandwidth gives a ridge-point near 27 FLOPs/byte which is about double that of the A100’s 13 FLOPs/byte. The larger on-package HBM (128 – 256 GB) GPU memory also shifts the practical trade-offs between tiling depth and occupancy. Although this is very expensive and does not have CUDA (which can be good and bad at the same time).

replies(2): >>44367014 #>>44380929 #

apitman ◴[24 Jun 25 15:02 UTC] No.44367014[source]▶

>>44366911 #

Unfortunately Nvidia GPUs are the only ones that matter until AMD starts taking their computer software seriously.

replies(2): >>44367150 #>>44368272 #

fooblaster ◴[24 Jun 25 15:15 UTC] No.44367150[source]▶

>>44367014 #

They are. It's just not at the consumer hardware level.

replies(2): >>44368013 #>>44368161 #

tucnak ◴[24 Jun 25 16:46 UTC] No.44368161{3}[source]▶

>>44367150 #

This misconception is repeated time and time again; software support of their datacenter-grade hardware is just as bad. I've had the displeasure of using MI50, MI100 (a lot), MI210 (very briefly.) All three are supposedly enterprise-grade computing hardware, and yet, it was a pathetic experience with a myriad of disconnected components which had to be patched, & married with a very specific kernel version to get ANY kind of LLM inference going.

Now, the last of it I bothered with was 9 months ago; enough is enough.

replies(1): >>44369737 #

fooblaster ◴[24 Jun 25 19:04 UTC] No.44369737{4}[source]▶

>>44368161 #

this hardware is ancient history. mi250 and mi300 are much better supported

replies(1): >>44370312 #

1. tucnak ◴[24 Jun 25 19:50 UTC] No.44370312{5}[source]▶

>>44369737 #

What a load of nonsense. MI210 effectively hit the market in 2023, similarly to H100. We're talking about datacenter-grade, two-year out of date card, and it's already "ancient history?"

No wonder nobody on this site trusts AMD.

replies(2): >>44370954 #>>44372754 #

2. bluescrn ◴[24 Jun 25 20:55 UTC] No.44370954[source]▶

>>44370312 (TP) #

Unless you're, you know, using GPUs for graphics...

Xbox, Playstation, and Steam Deck seem to be doing pretty nicely with AMD.

replies(1): >>44372550 #

3. MindSpunk ◴[25 Jun 25 00:32 UTC] No.44372550[source]▶

>>44370954 #

The quantity of people on this site now that care about GPUs all of a sudden because of the explosion of LLMs, who fail to understand that GPUs are _graphics_ processors that are designed for _graphics_ workloads is insane. It almost feels like the popular opinion here is that graphics is just dead and AMD and NVIDIA should throw everything else they do in the bin to chase the LLM bag.

AMD make excellent graphics hardware, and the graphics tools are also fantastic. AMD's pricing and market positioning can be questionable but the hardware is great. They're not as strong with machine learning tasks, and they're in a follower position for tensor acceleration, but for graphics they are very solid.

replies(2): >>44372797 #>>44373861 #

4. fooblaster ◴[25 Jun 25 01:08 UTC] No.44372754[source]▶

>>44370312 (TP) #

my experience with the mi300 does not mirror yours. If I have a complaint, it's that it's performance does not live up to expectations.

5. almostgotcaught ◴[25 Jun 25 01:15 UTC] No.44372797{3}[source]▶

>>44372550 #

The quantity of people on this site now that think they understand modern GPUs because back in the day they wrote some opengl...

1. Both AMD and NVIDIA have "tensorcore" ISA instructions (ie real silicon/data-path, not emulation) which have zero use case in graphics

2. Ain't no one playing video games on MI300/H100 etc and the ISA/architecture reflects that

> but for graphics they are very solid.

Hmmm I wonder if AMD's overfit-to-graphics architectural design choices are a source of friction as they now transition to serving the ML compute market... Hmmm I wonder if they're actively undoing some of these choices...

replies(3): >>44373920 #>>44375258 #>>44375768 #

6. _carbyau_ ◴[25 Jun 25 05:21 UTC] No.44373861{3}[source]▶

>>44372550 #

Just having fun with an out of context quote.

> graphics is just dead and AMD and NVIDIA should throw everything else they do in the bin to chase the LLM bag

No graphics means that games of the future will be like:

"You have been eaten by a ClautGemPilot."

7. MindSpunk ◴[25 Jun 25 05:34 UTC] No.44373920{4}[source]▶

>>44372797 #

AMD isn't overfit to graphics. AMD's GPUs were friendly to general purpose compute well before Nvidia was. Hardware-wise anyway. AMD's memory access system and resource binding model was well ahead of Nvidia for a long time. When Nvidia was stuffing resource descriptors into special palettes with addressing limits, AMD was fully bindless under the hood. Everything was just one big address space, descriptors and data.

Nvidia 15 years ago was overfit to graphics. Nvidia just made smarter choices, sold more hardware and re-invested their winnings into software and improving their hardware. Now they're just as good at GPGPU with a stronger software stack.

AMD has struggled to be anything other than a follower in the market and has suffered quite a lot as a result. Even in graphics. Mesh shaders in DX12 was the result of NVIDIA dictating a new execution model that was very favorable to their new hardware while AMD had already had a similar (but not perfectly compatible) system since the Vega called primitive shaders.

8. averne_ ◴[25 Jun 25 09:27 UTC] No.44375258{4}[source]▶

>>44372797 #

Matrix instructions do of course have uses in graphics. One example of this is DLSS.

replies(1): >>44378631 #

9. lomase ◴[25 Jun 25 10:51 UTC] No.44375768{4}[source]▶

>>44372797 #

Imagine thinking you know more than others because you use a different abstraction layer.

10. Agentlien ◴[25 Jun 25 15:47 UTC] No.44378631{5}[source]▶

>>44375258 #

This feels backwards to me when GPUs were created largely because graphics needed lots of parallel floating point operations, a big chunk of which are matrix multiplications.

When I think of matrix multiplication in graphics I primarily think of transforms between spaces: moving vertices from object space to camera space, transforming from camera space to screen space, ... This is a big part of the math done in regular rendering and needs to be done for every visible vertex in the scene - typically in the millions in modern games.

I suppose the difference here is that DLSS is a case where you primarily do large numbers of consecutive matrix multiplications with little other logic, since it's more ANN code than graphics code.

↑

Basic Facts about GPUs