AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

1. AndrewKemendo ◴[12 Feb 24 14:53 UTC] No.39345391[source]▶

ROCm is not spelled out anywhere in their documentation and the best answers in search come from Github and not AMD official documents

"Radeon Open Compute Platform"

https://github.com/ROCm/ROCm/issues/1628

And they wonder why they are losing. Branding absolutely matters.

replies(8): >>39345468 #>>39345472 #>>39345491 #>>39345516 #>>39345873 #>>39345947 #>>39345989 #>>39352446 #

2. rtavares ◴[12 Feb 24 14:59 UTC] No.39345468[source]▶

>>39345391 (TP) #

Later in the same thread:

> ROCm is a brand name for ROCm™ open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures).

> Note, ROCm no longer functions as an acronym.

replies(1): >>39347658 #

3. marcus0x62 ◴[12 Feb 24 15:00 UTC] No.39345472[source]▶

>>39345391 (TP) #

That, and it only runs on a handful of their GPUs.

replies(1): >>39345568 #

4. alwayslikethis ◴[12 Feb 24 15:01 UTC] No.39345491[source]▶

>>39345391 (TP) #

I mean, I also had to look up what CUDA stands for.

replies(2): >>39347306 #>>39351980 #

5. phh ◴[12 Feb 24 15:03 UTC] No.39345516[source]▶

>>39345391 (TP) #

I have no idea what CUDA stands for, and I live just fine without knowing it.

replies(4): >>39345887 #>>39346051 #>>39346549 #>>39348995 #

6. NekkoDroid ◴[12 Feb 24 15:07 UTC] No.39345568[source]▶

>>39345472 #

If you are talking about the "supported" list of GPUs, those listed are only the ones they fully validate and QA test, other of same gen are likely to work, but most likely with some bumps along the way. In one of the a bit older phoronix posts about ROCm one of their engeneers did say they are trying to expand the list of validated & QA'd cards, as well as destinguishing between "validated", "supported" and "non-functional"

replies(1): >>39351587 #

7. sorenjan ◴[12 Feb 24 15:27 UTC] No.39345873[source]▶

>>39345391 (TP) #

Funnily enough it doesn't work on their RDNA ("Radeon DNA") hardware (with some exceptions I think), but it's aimed at their CDNA (Compute DNA). If they would come up with a new name today it probably wouldn't include Radeon.

AMD seems to be a firm believer in separating the consumer chips for gaming and the compute chips for everything else. This probably makes a lot of sense from a chip design and current business perspective, but I think it's shortsighted and a bad idea. GPUs are very competent compute devices, and basically wasting all that performance for "only" gaming is strange to me. AI and other compute is getting more and more important for things like image and video processing, language models, etc. Not only for regular consumers, but for enthusiasts and developers it makes a lot of sense to be able to use your 10 TFLOPS chip even when you're not gaming.

While reading through the AMD CDNA whitepaper I saw this and got a good chuckle. "culmination of years of effort by AMD" indeed.

> The computational resources offered by the AMD CDNA family are nothing short of astounding. However, the key to heterogeneous computing is a software stack and ecosystem that easily puts these abilities into the hands of software developers and customers. The AMD ROCm 4.0 software stack is the culmination of years of effort by AMD to provide an open, standards-based, low-friction ecosystem that enables productivity creating portable and efficient high-performance applications for both first- and third-party developers.

https://www.amd.com/content/dam/amd/en/documents/instinct-bu...

replies(1): >>39346337 #

8. moffkalast ◴[12 Feb 24 15:28 UTC] No.39345887[source]▶

>>39345516 #

Cleverly Undermining Disorganized AMD

9. atq2119 ◴[12 Feb 24 15:32 UTC] No.39345947[source]▶

>>39345391 (TP) #

My understanding is that there was some trademark silliness around "open compute", and AMD decided that instead of doing a full rebrand, they would stick to ROCm but pretend that it wasn't ever an acronym.

replies(1): >>39346302 #

10. slavik81 ◴[12 Feb 24 15:35 UTC] No.39345989[source]▶

>>39345391 (TP) #

That is intentional. We had to change the name. ROCm is no longer an acronym.

replies(1): >>39346664 #

11. rvnx ◴[12 Feb 24 15:38 UTC] No.39346051[source]▶

>>39345516 #

Countless Updates Developer Agony

replies(2): >>39346752 #>>39350061 #

12. michaellarabel ◴[12 Feb 24 15:56 UTC] No.39346302[source]▶

>>39345947 #

Yeah it was due to the Open Compute Project AFAIK... Though for a little while AMD was telling me they really meant to call it "Radeon Open eCosystem" before then dropping that too with many still using the original name.

13. slavik81 ◴[12 Feb 24 15:58 UTC] No.39346337[source]▶

>>39345873 #

ROCm works fine on the RDNA cards. On Ubuntu 23.10 and Debian Sid, the system packages for the ROCm math libraries have been built to run on every discrete Vega, RDNA 1, RDNA 2, CDNA 1, and CDNA 2 GPU. I've manually tested dozens of cards and every single one worked. There were just a handful of bugs in a couple of the libraries that could easily be fixed by a motivated individual. https://slerp.xyz/rocm/logs/full/

The system package for HIP on Debian has been stuck on ROCm 5.2 / clang-15 for a while, but once I get it updated to ROCm 5.7 / clang-17, I expect that all discrete RDNA 3 GPUs will work.

replies(1): >>39349534 #

14. smokel ◴[12 Feb 24 16:12 UTC] No.39346549[source]▶

>>39345516 #

Compute Unified Device Architecture [1]

[1] https://en.wikipedia.org/wiki/CUDA

15. AndrewKemendo ◴[12 Feb 24 16:20 UTC] No.39346664[source]▶

>>39345989 #

I assume you’re on the team if you’re saying “we”

Can you say why you had to change the name?

16. egorfine ◴[12 Feb 24 16:25 UTC] No.39346752{3}[source]▶

>>39346051 #

This is the right definition.

17. hasmanean ◴[12 Feb 24 17:03 UTC] No.39347306[source]▶

>>39345491 #

Compute unified device architecture ?

18. ametrau ◴[12 Feb 24 17:28 UTC] No.39347658[source]▶

>>39345468 #

>> Note, ROCm no longer functions as an acronym.

That is really dumb. Like LLVM.

19. alfalfasprout ◴[12 Feb 24 19:16 UTC] No.39348995[source]▶

>>39345516 #

Crap, updates destroyed (my) application

20. stonogo ◴[12 Feb 24 19:54 UTC] No.39349534{3}[source]▶

>>39346337 #

It doesn't matter to my lab whether it technically runs. According to https://rocm.docs.amd.com/projects/install-on-linux/en/lates... it only supports three commercially-available Radeon cards (and four available Radeon Pro) on Linux. Contrast this to CUDA, which supports literally every nVIDIA card in the building, including the crappy NVS series and weirdo laptop GPUs, and it basically becomes impossible to convince anyone to develop for ROCm.

21. hyperbovine ◴[12 Feb 24 20:30 UTC] No.39350061{3}[source]▶

>>39346051 #

Lost five hours of my life yesterday discovering the fact that "CUDA 12.3" != "CUDA 12.3 Update 2".

(Yes, that's obvious, but not so obvious when your GPU applications submitted to a cluster start crashing randomly for no apparent reason.)

22. machomaster ◴[12 Feb 24 22:45 UTC] No.39351587{3}[source]▶

>>39345568 #

They can say whatever but the action is what matters, not wishes and promises. And the reality is that list of supported GPUs has been unchanged since they first announced it a year ago.

23. mjcohen ◴[12 Feb 24 23:23 UTC] No.39351980[source]▶

>>39345491 #

Can't Use Devices (by) AMD

24. Farfignoggen ◴[13 Feb 24 00:04 UTC] No.39352446[source]▶

>>39345391 (TP) #

Lisa Su in a later presentation/event announced that ROCM is no longer an acronym! So Radeon Open CoMpute is no longer the definition there! But ROCm/HIP and CUDA/CUDA Tools, and OneAPI/Level-0 are essentially the same coverage/scope for AMD, Nvidia, Intel respectively as far as GPU Compute API support goes and HPC/Accelerator workloads as well.

So there's a YouTube Video from some Supercomputer conference where the presenter goes over the support Matrix info for ROCm/HIP, CUDA/CUDA Tools, and OneAPI/Level-0 and they are similar in scope there.