Basic Facts about GPUs

(damek.github.io)

338 points ibobev | 2 comments | 24 Jun 25 12:15 UTC | HN request time: 0.524s | source

Show context

elashri ◴[24 Jun 25 14:52 UTC] No.44366911[source]▶

Good article summarizing good chunk of information that people should have some idea about. I just want to comment that the title is a little bit misleading because this is talking about the very choices that NVIDIA follows in developing their GPU archs which is not what always what others do.

For example, the arithmetic intensity break-even point (ridge-point) is very different once you leave the NVIDIA-land. If we take AMD Instinct MI300, it has up to 160 TFLOPS FP32 paired with ~6 TB/s of HBM3/3E bandwidth gives a ridge-point near 27 FLOPs/byte which is about double that of the A100’s 13 FLOPs/byte. The larger on-package HBM (128 – 256 GB) GPU memory also shifts the practical trade-offs between tiling depth and occupancy. Although this is very expensive and does not have CUDA (which can be good and bad at the same time).

replies(2): >>44367014 #>>44380929 #

apitman ◴[24 Jun 25 15:02 UTC] No.44367014[source]▶

>>44366911 #

Unfortunately Nvidia GPUs are the only ones that matter until AMD starts taking their computer software seriously.

replies(2): >>44367150 #>>44368272 #

tucnak ◴[24 Jun 25 16:56 UTC] No.44368272[source]▶

>>44367014 #

Unfortunately, GPU's are old news now. When it comes to perf/watt/dollar, TPU's are substantially ahead for both training and inference. There's a sparsity disadvantage with the trailing-edge TPU devices such as v4 but if you care about large-scale training of any sort, it's not even close. Additionally, Tenstorrent p300 devices are hitting the market soon enough, and there's lots of promising stuff is coming on Xilinx side of the AMD shop: the recent Versal chips allow for AI compute-in-network capabilities that puts NVIDIA Bluefield's supposed programmability to shame. NVIDIA likes to say Bluefield is like a next-generation SmartNIC, but compared to actually field-programmable Versal stuff, it's more like 100BASE-T cards from the 90s.

I think it's very naive to assume that GPU's will continue to dominate the AI landscape.

replies(2): >>44369832 #>>44370305 #

menaerus ◴[24 Jun 25 19:12 UTC] No.44369832[source]▶

>>44368272 #

So, where does one buy a TPU?

replies(1): >>44370398 #

tucnak ◴[24 Jun 25 19:59 UTC] No.44370398[source]▶

>>44369832 #

The actual lead times on similarly-capable GPU systems are so long, by the time your order is executed, you're already losing money. Even assuming perfect utilization, and perfect after-market conditions—you won't be making any money on the hardware anyway.

Buy v. rent calculus is only viable if there's no asymmetry between the two. Oftentimes, what you can rent you cannot buy, and vice-versa, what you can buy—you could never rent. Even if you _could_ buy an actual TPU, you wouldn't be able to run it anyway, as it's all built around sophisticated networking and switching topologies[1]. The same goes for GPU deployments of comparable scale: what made you think that you could buy and run GPU's at scale?

It's a fantasy.

[1] https://arxiv.org/abs/2304.01433

replies(2): >>44370513 #>>44371092 #

almostgotcaught ◴[24 Jun 25 20:10 UTC] No.44370513[source]▶

>>44370398 #

Is your answer to "where can I buy a TPU" that you can't buy a GPU either? That's a new one.

First of all I don't understand how that's an answer. Second of all it's laughably wrong - I can name 5 firms (outside of FAANG) off the top of my head with >1k Blackwell devices and they're making very good money (have you ever heard of quantfi....). Third of all, how is TPU going to conquer absolutely anything when (as you admit) you couldn't run one even if you could buy one?

replies(1): >>44371304 #

1. tucnak ◴[24 Jun 25 21:34 UTC] No.44371304[source]▶

>>44370513 #

I'd never claimed that "TPU is going to conquer everything," it's a matter of fact that the latest-generation TPU is currently the most cost-effective solution for large-scale training. I'm not even saying that NVIDIA has lost, just that GPU's have lost. Maybe NVIDIA comes up with a non-GPU based system, and it includes programmable fabric to enable compute-in-network capabilities, sure, anything other than Bluefield nonsense, but it's already clear from the engineering standpoint that the large HBM-stacks attached to a "GPU"+Bluefield formula is over.

replies(1): >>44371369 #

2. almostgotcaught ◴[24 Jun 25 21:40 UTC] No.44371369[source]▶

>>44371304 (TP) #

> NVIDIA has lost, just that GPU's have lost

i hope you realize how silly you sound when

1. NVDA's market cap is 70% more than GOOG's

2. there is literally not a single other viable competitor to GPGPU amongst the 30 or so "accelerator" companies that all swear their thing will definitely be the one, even with many of them approaching 10 years in the market by now (cerebras, samba nova, groq, dmatrix, blah blah blah).

↑