AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

(www.phoronix.com)

1045 points mfiguiere | 5 comments | 12 Feb 24 14:00 UTC | HN request time: 0.209s | source

Show context

btown ◴[12 Feb 24 14:37 UTC] No.39345221[source]▶

Why would this not be AMD’s top priority among priorities? Someone recently likened the situation to an Iron Age where NVIDIA owns all the iron. And this sounds like AMD knowing about a new source of ore and not even being willing to sink a single engineer’s salary into exploration.

My only guess is they have a parallel skunkworks working on the same thing, but in a way that they can keep it closed-source - that this was a hedge they think they no longer need, and they are missing the forest for the trees on the benefits of cross-pollination and open source ethos to their business.

replies(14): >>39345241 #>>39345302 #>>39345393 #>>39345400 #>>39345458 #>>39345853 #>>39345857 #>>39345893 #>>39346210 #>>39346792 #>>39346857 #>>39347433 #>>39347900 #>>39347927 #

fariszr ◴[12 Feb 24 14:39 UTC] No.39345241[source]▶

>>39345221 #

According to the article, AMD seems to have pulled the plug on this as they think it will hinder ROCMv6 adoption, which still btw only supports two consumer cards out of their entire line up[1]

1. https://www.phoronix.com/news/AMD-ROCm-6.0-Released

replies(4): >>39345503 #>>39345558 #>>39346200 #>>39346480 #

kkielhofner ◴[12 Feb 24 15:07 UTC] No.39345558[source]▶

>>39345241 #

With the most recent card being their one year old flagship ($1k) consumer GPU...

Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released. They'll even go as far as doing things like adding support for new GPUs/compute families to older CUDA versions (see Hopper/Ada and CUDA 11.8).

You can go out and buy any Nvidia GPU the day of release, take it home, plug it in, and everything just works. This is what people expect.

AMD seems to have no clue that this level of usability is what it will take to actually compete with Nvidia and it's a real shame - their hardware is great.

replies(5): >>39345774 #>>39345894 #>>39346438 #>>39346550 #>>39346788 #

1. roenxi ◴[12 Feb 24 15:28 UTC] No.39345894[source]▶

>>39345558 #

You've got to remember that AMD are behind at all aspects of this, including documenting their work in an easily digestible way.

"Support" means that the card is actively tested and presumably has some sort of SLA-style push to fix bugs for. As their stack matures, a bunch of cards that don't have official support will work well [0]. I have an unsupported card. There are horrible bugs. But the evidence I've seen is that the card will work better with time even though it is never going to be officially supported. I don't think any of my hardware is officially supported by the manufacturer, but the kernel drivers still work fine.

> Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released...

A lot of older Nvidia cards don't support CUDA v9 [1]. It isn't like everything supports everything, particularly in the early part of building out capability. The impression I'm getting is that in practice the gap in strategy here is not as large as the current state makes it seem.

[0] If anyone has bought an AMD card for their machine to multiply matrices they've been gambling on whether the capability is there. This comment is reasonable speculation, but I want to caveat the optimism by asserting that I'm not going to put money into AMD compute until there is some some actual evidence on the table that GPU lockups are rare.

[1] https://en.wikipedia.org/wiki/CUDA#GPUs_supported

replies(3): >>39346802 #>>39347041 #>>39347408 #

2. spookie ◴[12 Feb 24 16:29 UTC] No.39346802[source]▶

>>39345894 (TP) #

To be fair, if anything, that table still shows you'll have compatibility with at least 3 major releases. Either way, I agree their strategy is getting results, it just takes time. I do prefer their open source commitment, I just hope they continue.

3. paulmd ◴[12 Feb 24 16:45 UTC] No.39347041[source]▶

>>39345894 (TP) #

All versions of CUDA support PTX, which is an intermediate bytecode/compiler representation that can be finally-compiled by even CUDA 1.0.

So the contract is: as long as your future program does not touch any intrinsics etc that do not exist in CUDA 1.0, you can export the new program from CUDA 27.0 as PTX, and the GTX 6800 driver will read the PTX and let your gpu run it as CUDA 1.0 code… so it is quite literally just as they describe, unlimited forward and backward capability/support as long as you go through PTX in the middle.

https://docs.nvidia.com/cuda/archive/10.1/parallel-thread-ex...

https://en.wikipedia.org/wiki/Parallel_Thread_Execution

4. ColonelPhantom ◴[12 Feb 24 17:10 UTC] No.39347408[source]▶

>>39345894 (TP) #

CUDA dropped Tesla (from 2006!) only as of 7.0, which seems to have released around 2015. Fermi support lasted from 2010 until 2017, giving it a solid 7 years still. Kepler support was dropped around 2020, and the first cards were released in 2012.

As such Fermi seems to be the shortest supported architecture, and it was around for 7 years. GCN4 (Polaris) was introduced in 2016, and seems to have been officially dropped around 2021, just 5 years in. While you could still get it working with various workarounds, I don't see the evidence of Nvidia being even remotely as hasty as AMD with removing support, even for early architectures like Tesla and Fermi.

replies(1): >>39347791 #

5. hedgehog ◴[12 Feb 24 17:39 UTC] No.39347791[source]▶

>>39347408 #

On top of this some Kepler support (for K80s etc) is still maintained in CUDA 11 which was last updated late 2022, and libraries like PyTorch and TensorFlow still support CUDA 11.8 out of the box.

↑