Most active commenters

pjmlp(4)

AMD now has more compute on the top 500 than Nvidia

(www.nextplatform.com)

Show context

pie420 ◴[18 Nov 24 20:14 UTC] No.42176400[source]▶

layperson with no industry knowledge, but it seems like nvidia's CUDA moat will fall in the next 2-5 years. It seems impossible to sustain those margins without competition coming in and getting a decent slice of the pie

replies(5): >>42176440 #>>42177575 #>>42177944 #>>42178259 #>>42179625 #

metadat ◴[18 Nov 24 20:17 UTC] No.42176440[source]▶

>>42176400 #

But how will AMD or anyone else push in? CUDA is actually a whole virtualization layer on top of the hardware and isn't easily replicable, Nvidia has been at it for 17 years.

You are right, eventually something's gotta give. The path for this next leg isn't yet apparent to me.

P.s. how much is an exaflop or petaflop, and how significant is it? The numbers thrown around in this article don't mean anything to me. Is this new cluster way more powerful than the last top?

replies(14): >>42176567 #>>42176711 #>>42176809 #>>42177061 #>>42177287 #>>42177319 #>>42177378 #>>42177451 #>>42177452 #>>42177477 #>>42177479 #>>42178108 #>>42179870 #>>42180214 #

1. bryanlarsen ◴[18 Nov 24 20:27 UTC] No.42176567[source]▶

>>42176440 #

Anybody spending tens of billions annually on Nvidia hardware is going to be willing to spend millions to port their software away from CUDA.

replies(3): >>42176963 #>>42177463 #>>42182571 #

2. echelon ◴[18 Nov 24 20:59 UTC] No.42176963[source]▶

>>42176567 (TP) #

For the average non-FAANG company, there's nothing to port to yet. We don't all have the luxury of custom TPUs.

3. talldayo ◴[18 Nov 24 21:47 UTC] No.42177463[source]▶

>>42176567 (TP) #

To slower hardware? What are they supposed to port to, ASICs?

replies(1): >>42177525 #

4. adgjlsfhk1 ◴[18 Nov 24 21:54 UTC] No.42177525[source]▶

>>42177463 #

if the hardware is 30% slower and 2x cheaper, that's a pretty great deal.

replies(1): >>42177861 #

5. selectodude ◴[18 Nov 24 22:37 UTC] No.42177861{3}[source]▶

>>42177525 #

Power density tends to be the limiting factor for this stuff, not money. If it's 30 percent slower per watt, it's useless.

replies(1): >>42178459 #

6. Wytwwww ◴[18 Nov 24 23:37 UTC] No.42178459{4}[source]▶

>>42177861 #

The ratio between power usage and GPU cost is very, very different than with CPUs, though. If you could save e.g. 20-30% of the purchase price that might make it worth it.

e.g. you could run a H100 at 100% utilization 24/7 for 1 years at $0.4 per kWh (so assuming significant overhead for infrastructure etc.) and that would only cost ~10% of the purchase price of the GPU itself.

replies(1): >>42179046 #

7. wbl ◴[19 Nov 24 00:53 UTC] No.42179046{5}[source]▶

>>42178459 #

Power usage cost isn't the money but the capacity and cooling.

replies(1): >>42181611 #

8. Wytwwww ◴[19 Nov 24 09:42 UTC] No.42181611{6}[source]▶

>>42179046 #

Yes, I know that. Hence I quadrupled the price of electricity or are you saying that the cost of capacity and cooling doesn't scale directly with power usage?

We can increase that another 2x and the cost would still be relatively low compared to the price/deprecation of the GPU itself.

9. pjmlp ◴[19 Nov 24 12:10 UTC] No.42182571[source]▶

>>42176567 (TP) #

First they need to support everything that CUDA is capable of in programing language portfolio, tooling and libraries.

replies(1): >>42183003 #

10. bryanlarsen ◴[19 Nov 24 13:09 UTC] No.42183003[source]▶

>>42182571 #

A typical LLM might use about 0.1% of CUDA. That's all that would have to be ported to get that LLM to work.

replies(1): >>42183651 #

11. pjmlp ◴[19 Nov 24 14:15 UTC] No.42183651{3}[source]▶

>>42183003 #

Which is missing the point why CUDA has won.

Then again, maybe the goal is getting 0.1% of CUDA market share. /s

replies(2): >>42184109 #>>42184220 #

12. its_down_again ◴[19 Nov 24 14:59 UTC] No.42184109{4}[source]▶

>>42183651 #

In the words of Gilfoyle-- I'll bite. Why has CUDA won?

replies(1): >>42184726 #

13. imtringued ◴[19 Nov 24 15:09 UTC] No.42184220{4}[source]▶

>>42183651 #

Nvidia has won because their compute drivers don't crash people's systems when they run e.g. Vulkan Compute.

You are mostly listing irrelevant nice to have things that aren't deal breakers. AMD's consumer GPUs have a long history of being abandoned a year or two after release.

replies(1): >>42184735 #

14. pjmlp ◴[19 Nov 24 15:44 UTC] No.42184726{5}[source]▶

>>42184109 #

CUDA C++, CUDA Fortran, CUDA Anything PTX, plus libraries, IDE integration, GPU graphical debugging.

Coupled with Khronos, Intel, AMD never delivering anything comparable with OpenCL, Apple losing interest after Khronos didn't took OpenCL into the direction they wanted, Google never adopting it favouring their Renderscript dialect.

15. pjmlp ◴[19 Nov 24 15:45 UTC] No.42184735{5}[source]▶

>>42184220 #

CUDA C++, CUDA Fortran, CUDA Anything PTX, plus libraries, IDE integration, GPU graphical debugging, aren't only nice to have things.

↑