AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

(www.phoronix.com)

1045 points mfiguiere | 3 comments | 12 Feb 24 14:00 UTC | HN request time: 0s | source

Show context

btown ◴[12 Feb 24 14:37 UTC] No.39345221[source]▶

Why would this not be AMD’s top priority among priorities? Someone recently likened the situation to an Iron Age where NVIDIA owns all the iron. And this sounds like AMD knowing about a new source of ore and not even being willing to sink a single engineer’s salary into exploration.

My only guess is they have a parallel skunkworks working on the same thing, but in a way that they can keep it closed-source - that this was a hedge they think they no longer need, and they are missing the forest for the trees on the benefits of cross-pollination and open source ethos to their business.

replies(14): >>39345241 #>>39345302 #>>39345393 #>>39345400 #>>39345458 #>>39345853 #>>39345857 #>>39345893 #>>39346210 #>>39346792 #>>39346857 #>>39347433 #>>39347900 #>>39347927 #

modeless ◴[12 Feb 24 16:33 UTC] No.39346857[source]▶

>>39345221 #

I've been critical of AMD's failure to compete in AI for over a decade now, but I can see why AMD wouldn't want to go the route of cloning CUDA and I'm surprised they even tried. They would be on a never ending treadmill of feature catchup and bug-for-bug compatibility, and wouldn't have the freedom to change the API to suit their hardware.

The right path for AMD has always been to make their own API that runs on all of their own hardware, just as CUDA does for Nvidia, and push support for that API into all the open source ML projects (but mostly PyTorch), while attacking Nvidia's price discrimination by providing features they use to segment the market (e.g. virtualization, high VRAM) at lower price points.

Perhaps one day AMD will realize this. It seems like they're slowly moving in the right direction now, and all it took for them to wake up was Nvidia's market cap skyrocketing to 4th in the world on the back of their AI efforts...

replies(1): >>39346947 #

matchagaucho ◴[12 Feb 24 16:39 UTC] No.39346947[source]▶

>>39346857 #

But AMD was formed to shadow Intel's x86?

replies(2): >>39347014 #>>39348696 #

modeless ◴[12 Feb 24 16:43 UTC] No.39347014[source]▶

>>39346947 #

ISAs are smaller and less stateful and better documented and less buggy and most importantly they evolve much more slowly than software APIs. Much more feasible to clone. Especially back when AMD started.

replies(1): >>39347205 #

paulmd ◴[12 Feb 24 16:57 UTC] No.39347205[source]▶

>>39347014 #

PTX is just an ISA too. Programming languages annd ISA representations are effectively fungible, that’s the lesson of Microsoft CLR/Intermediate Language and Java too. A “machine” is a hardware and a language.

replies(1): >>39347262 #

1. modeless ◴[12 Feb 24 17:00 UTC] No.39347262[source]▶

>>39347205 #

PTX is not a hardware ISA though, it's still software and can change more rapidly.

replies(1): >>39347707 #

2. paulmd ◴[12 Feb 24 17:33 UTC] No.39347707[source]▶

>>39347262 (TP) #

Not without breaking the support contract? If you change PTX format then CUDA 1.0 machines can no longer it and it's no longer PTX.

Again, you are missing the point. Java is both a language (java source) and a machine (the JVM). The latter is a hardware ISA - there are processors that implement Java bytecode as their ISA format. Yet most people who are running Java are not doing so on java-machine hardware, yet they are using the java ISA in the process.

https://en.wikipedia.org/wiki/Java_processor

https://en.wikipedia.org/wiki/Bytecode#Execution

any bytecode is an ISA, the bytecode spec defines the machine and you can physically build such a machine that executes bytecode directly. Or you can translate via an intermediate layer, like how Transmeta Crusoe processors executed x86 as bytecode on a VLIW processor (and how most modern x86 processors actually use RISC micro-ops inside).

these are completely fungible concepts. They are not quite the same thing but bytecode is clearly an ISA in itself. Any given processor can choose to use a particular bytecode as either an ISA or translate it to its native representation, and this includes both PTX, Java, and x86 (among all other bytecodes). And you can do the same for any other ISA (x86 as bytecode representation, etc).

furthermore, what most people think of as "ISAs" aren't necessarily so. For example RDNA2 is an ISA family - different processors have different capabilities (for example 5500XT has mesh shader support while 5700XT does not) and the APUs use a still different ISA internally etc. GFX1101 is not the same ISA as GFX1103 and so on. These are properly implementations not ISAs, or if you consider it to be an ISA then there is also a meta-ISA encompassing larger groups (which also applies to x86's numerous variations). But people casually throw it all into the "ISA" bucket and it leads to this imprecision.

like many things in computing, it's all a matter of perspective/position. where is the boundary between "CMT core within a 2-thread module that shares a front-end" and "SMT thread within a core with an ALU pinned to one particular thread"? It's a matter of perspective. Where is the boundary of "software" vs "hardware" when virtually every "software" implementation uses fixed-function accelerator units and every fixed-function accelerator unit is running a control program that defines a flow of execution and has schedulers/scoreboards multiplexing the execution unit across arbitrary data flows? It's a matter of perspective.

replies(1): >>39349189 #

3. modeless ◴[12 Feb 24 19:30 UTC] No.39349189[source]▶

>>39347707 #

You are missing the point. PTX is not designed as a vendor neutral abstraction like JVM/CLR bytecode. Furthermore CUDA is a lot more than PTX. There's a whole API there, plus applications ship machine code and rely on Nvidia libraries which can be prohibited from running on AMD by license and with DRM, so those large libraries would also become part of the API boundary that AMD would have to reimplement and support.

Chasing CUDA compatibility is a fool's errand when the most important users of CUDA are open source. Just add explicit AMD support upstream and skip the never ending compatibility treadmill, and get better performance too. And once support is established and well used the community will pitch in to maintain it.

↑