AMD now has more compute on the top 500 than Nvidia

(www.nextplatform.com)

198 points rbanffy | 1 comments | 18 Nov 24 18:54 UTC | HN request time: 0.307s | source

Show context

amelius ◴[18 Nov 24 21:27 UTC] No.42177249[source]▶

Why the focus on AMD and Nvidia? It really isn't that hard to design a large number of ALU blocks into some silicon IP block and make them work together efficiently.

The real accomplishment is fabricating them.

replies(2): >>42177288 #>>42177324 #

talldayo ◴[18 Nov 24 21:34 UTC] No.42177324[source]▶

>>42177249 #

> It really isn't that hard to design a large number of ALU blocks into some silicon IP block and make them work together efficiently.

It really is that hard, and the fabrication side of the issue the easy part from Nvidia's perspective - you just pay TSMC a shitload of money. Nvidia's real victory (besides leading on performance-per-watt) is that their software stack doesn't suck. They invested in complex shader units and tensor accelerators that scale with the size of the card rather than being restrained in puny and limited NPUs. CUDA unified this featureset and was industry-entrenched for almost a decade, which gave it pretty much any feature you could want be it crypto acceleration or AI/ML primitives.

The ultimate tragedy is that there was a potential future where a Free and Open Source CUDA alternative existed. Apple wrote the OpenCL spec for exactly that purpose and gave it to Khronos, but later abandoned it to focus on... checks clipboard MLX and Metal Performance Shaders. Oh, what could have been if the industry weren't so stingy and shortsighted.

replies(3): >>42177458 #>>42178281 #>>42182786 #

1. david-gpu ◴[18 Nov 24 23:20 UTC] No.42178281[source]▶

>>42177324 #

> It really is that hard

YES!! Thank you!

> Nvidia's real victory (besides leading on performance-per-watt) is that their software stack doesn't suck

YES! And it's not just CUDA and CUDA-adjacent tools, but also their cuDNN/cuBLAS/etc. libraries. They invest a massive amount of staffing into squeezingt the last drop of performance out of their hardware, identifying areas for improvement and feeding that back to the architects.

> Apple wrote the OpenCL spec for exactly that purpose and gave it to Khronos

Nitpick: Affie Munshi from Apple wrote down a draft and convinced his management to offer it to Khronos, where it was significantly modified over... was it a year or so?... by a number of representatives from a dozen companies or so. A ton of smart people contributed a ton of work into what became the 1.0 version.

And let me tell you that the discussions were often tense, both during the official meetings as well as what happened behind the scenes. The end result was as good as you can expect from a large committee composed of representatives from competing companies.

But, in summary, you get it, unlike so many commenters in HN.

↑