Most active commenters
  • amelius(6)
  • pjmlp(6)
  • talldayo(3)

←back to thread

195 points rbanffy | 18 comments | | HN request time: 1.109s | source | bottom
Show context
amelius ◴[] No.42177249[source]
Why the focus on AMD and Nvidia? It really isn't that hard to design a large number of ALU blocks into some silicon IP block and make them work together efficiently.

The real accomplishment is fabricating them.

replies(2): >>42177288 #>>42177324 #
1. talldayo ◴[] No.42177324[source]
> It really isn't that hard to design a large number of ALU blocks into some silicon IP block and make them work together efficiently.

It really is that hard, and the fabrication side of the issue the easy part from Nvidia's perspective - you just pay TSMC a shitload of money. Nvidia's real victory (besides leading on performance-per-watt) is that their software stack doesn't suck. They invested in complex shader units and tensor accelerators that scale with the size of the card rather than being restrained in puny and limited NPUs. CUDA unified this featureset and was industry-entrenched for almost a decade, which gave it pretty much any feature you could want be it crypto acceleration or AI/ML primitives.

The ultimate tragedy is that there was a potential future where a Free and Open Source CUDA alternative existed. Apple wrote the OpenCL spec for exactly that purpose and gave it to Khronos, but later abandoned it to focus on... checks clipboard MLX and Metal Performance Shaders. Oh, what could have been if the industry weren't so stingy and shortsighted.

replies(3): >>42177458 #>>42178281 #>>42182786 #
2. amelius ◴[] No.42177458[source]
> you just pay TSMC a shitload of money

I guess with money you can win any argument ...

replies(2): >>42177535 #>>42182822 #
3. talldayo ◴[] No.42177535[source]
Sure, Apple did the same thing with TSMC's 5nm node. They still lost in performance-per-watt in direct comparison with Nvidia GPUs using Samsung's 8nm node. Money isn't everything, even when you have so much of it that you can deny your competitors access to the tech you use.

Nvidia's lead is not only cemented by dense silicon. Their designs are extremely competitive, perhaps even a generational leap over what their competitors offer.

replies(1): >>42177688 #
4. amelius ◴[] No.42177688{3}[source]
Let me phrase it differently.

If Nvidia pulls the plug we can still go to AMD and have a reasonable alternative.

If TSMC pulls the plug, however ...

replies(3): >>42178302 #>>42178959 #>>42182827 #
5. david-gpu ◴[] No.42178281[source]
> It really is that hard

YES!! Thank you!

> Nvidia's real victory (besides leading on performance-per-watt) is that their software stack doesn't suck

YES! And it's not just CUDA and CUDA-adjacent tools, but also their cuDNN/cuBLAS/etc. libraries. They invest a massive amount of staffing into squeezingt the last drop of performance out of their hardware, identifying areas for improvement and feeding that back to the architects.

> Apple wrote the OpenCL spec for exactly that purpose and gave it to Khronos

Nitpick: Affie Munshi from Apple wrote down a draft and convinced his management to offer it to Khronos, where it was significantly modified over... was it a year or so?... by a number of representatives from a dozen companies or so. A ton of smart people contributed a ton of work into what became the 1.0 version.

And let me tell you that the discussions were often tense, both during the official meetings as well as what happened behind the scenes. The end result was as good as you can expect from a large committee composed of representatives from competing companies.

But, in summary, you get it, unlike so many commenters in HN.

6. david-gpu ◴[] No.42178302{4}[source]
Samsung's fabrication is about as good as TSMC. Or at least it was when I retired a few years ago.
7. talldayo ◴[] No.42178959{4}[source]
Then so what? It's whataboutism.

The practical answer is that all of FAANG will have to pick up the pieces once their supply chain is shattered. Samsung would quickly reach capacity with AMD and potentially Nvidia as priority customers, and Intel will be trying to court Nvidia and Apple as high-margin customers for some low-yield 18A contract. Depending on whether TSMC's Arizona foundry ever reaches operational capacity, they will be balancing orders from Nvidia and Apple in the same way they do today. Given the pitifully low investment, it's not really likely the Arizona facility will make a dent in the supply chain.

Fact is, Nvidia is well positioned to pick up the pieces even if 5nm> processes go away for the next decade. The only question is whether or not people will continue to have demand for CUDA, and the answer has been "yes" since long before crypto and AI were popular. If TSMC was bombed tomorrow, Nvidia would still have demand for their product and they would still have the capacity to sell it. Their competition with AMD would be somewhat normalized and Apple would be blown into the stratosphere upon realizing that they have to contract either Samsung or Intel to stay afloat. The implications for the American economy are a little upsetting but there's nothing particularly world-ending about that scenario. It would be a sad day to be a Geekbench enthusiast but life would go on.

replies(1): >>42182493 #
8. amelius ◴[] No.42182493{5}[source]
It could be. But I don't read anything about upcoming AI chip companies.

My predicition is there will be some strong competition for Nvidia in the coming years.

Since most people use CUDA through some other library (like Torch or TF), I think the dependence on CUDA isn't as strong as you make it seem.

9. pjmlp ◴[] No.42182786[source]
The industry, meaning Google decided to go with Renderscript C99 dialect for Android, while Intel and AMD never delivered anything that could match CUDA ecosystem (note the ecosystem part), Khronos never understanding the value of C++ and Fortran in HPC, they still don't in regards to Fortran.

Intel actually has proven to be more clever than AMD in that regard, as DataParalell C++ builds on top of SYCL (it isn't only SYCL), and Intel Fortran now also does GPU offloading.

10. pjmlp ◴[] No.42182822[source]
Only if the execution follows.
11. pjmlp ◴[] No.42182827{4}[source]
What is the reasonable alternative to CUDA Fortran on AMD?

One example out of many I can point out from CUDA ecosystem.

replies(2): >>42182969 #>>42185062 #
12. amelius ◴[] No.42182969{5}[source]
People use CUDA through a limited number of libraries, for example Torch and Tensorflow, so there isn't a really strong dependence on CUDA for many important applications.
replies(1): >>42183641 #
13. pjmlp ◴[] No.42183641{6}[source]
Some people working in machine learning, do use CUDA via Torch and Tensorflow.
replies(1): >>42185491 #
14. my123 ◴[] No.42185062{5}[source]
AMD ships a Fortran OpenMP compiler with GPU offloading that works pretty well
replies(1): >>42185508 #
15. amelius ◴[] No.42185491{7}[source]
Yes, most people in ML, and this field is currently on an exponential growth curve.
replies(1): >>42185514 #
16. pjmlp ◴[] No.42185508{6}[source]
Made public 6 days ago.

https://www.phoronix.com/news/AMD-Next-Gen-Fortran-Compiler

replies(1): >>42186516 #
17. pjmlp ◴[] No.42185514{8}[source]
And a tiny percentage of why CUDA is as big as it is.
18. my123 ◴[] No.42186516{7}[source]
That's the next gen one. Older one based on classic Flang has been in production since quite a while.