AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

1. Keyframe ◴[12 Feb 24 16:45 UTC] No.39347045[source]▶

This event of release is however a result of AMD stopped funding it per "After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs. One of the terms of my contract with AMD was that if AMD did not find it fit for further development, I could release it. Which brings us to today." from https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq

so, same mistake intel made before.

replies(8): >>39348941 #>>39349405 #>>39349842 #>>39350224 #>>39351024 #>>39351568 #>>39352021 #>>39360332 #

2. VoxPelli ◴[12 Feb 24 19:12 UTC] No.39348941[source]▶

>>39347045 (TP) #

Sounds like he had a good contract, would be great to read more about that, hopefully more devs could include the same phrasing!

3. nikanj ◴[12 Feb 24 19:44 UTC] No.39349405[source]▶

>>39347045 (TP) #

This should be the top comment here, people are getting their hopes up for nothing

4. jacoblambda ◴[12 Feb 24 20:15 UTC] No.39349842[source]▶

>>39347045 (TP) #

I mean it could also be that there was no business case for it as long as it remained closed source work.

If the now very clearly well functioning implementation continues to perform as well as it is, the community may be able to keep it funded and functioning.

And the other side of this is that with renewed AMD interest/support for the rocm/HIP project, it might be just good enough as a stopgap step to push projects towards rocm/HIP adoption. (included below is another blurb from the readme).

> I am a developer writing CUDA code, does this project help me port my code to ROCm/HIP?

> Currently no, this project is strictly for end users. However this project could be used for a much more gradual porting from CUDA to HIP than anything else. You could start with an unmodified application running on ZLUDA, then have ZLUDA expose the underlying HIP objects (streams, modules, etc.), allowing to rewrite GPU kernels one at a time. Or you could have a mixed CUDA-HIP application where only the most performance sensitive GPU kernels are written in the native AMD language.

5. pk-protect-ai ◴[12 Feb 24 20:43 UTC] No.39350224[source]▶

>>39347045 (TP) #

> After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs

Who was responsible at AMD for this project and why is he still not fired???????? How brain dead someone have to be to reject the major market share??????

replies(1): >>39351052 #

6. ◴[12 Feb 24 21:51 UTC] No.39351024[source]▶

>>39347045 (TP) #

7. ◴[12 Feb 24 21:53 UTC] No.39351052[source]▶

>>39350224 #

8. tgsovlerkhgsel ◴[12 Feb 24 22:44 UTC] No.39351568[source]▶

>>39347045 (TP) #

How is this not priority #1 for them, with NVIDIA stock shooting to the moon because everyone does machine learning using CUDA-centric tools?

If AMD could get 90% of the CUDA ML stuff to seamlessly run on AMD hardware, and could provide hardware at a competitive cost-per-performance (which I assume they probably could since NVIDIA must have an insane profit margin on their GPUs), wouldn't that be the opportunity to eat NVIDIA's lunch?

replies(6): >>39351635 #>>39351685 #>>39352131 #>>39353521 #>>39354718 #>>39360463 #

9. make3 ◴[12 Feb 24 22:50 UTC] No.39351635[source]▶

>>39351568 #

it's a common misconception that deep learning stuff is built in cuda. it's actually built on CUDNN kernels that don't use cuda but are actually gpu assembly written by hand by phds. I'm really not convinced that this project here would be able to be used for this. the ROCm kernels that are analogue to cudnn though, yes

replies(1): >>39354982 #

10. pheatherlite ◴[12 Feb 24 22:54 UTC] No.39351685[source]▶

>>39351568 #

The only reason our lab bought 20k worth of Nvidia gpu cards rather than amd was the cuda industry standard (might as wellbe). It's kind of mind boggling how much business amd must be losing over this.

replies(3): >>39351920 #>>39352937 #>>39354728 #

11. Rafuino ◴[12 Feb 24 23:18 UTC] No.39351920{3}[source]▶

>>39351685 #

So, your lab bought ~1 GPU?

replies(3): >>39352194 #>>39352979 #>>39353215 #

12. RachelF ◴[12 Feb 24 23:26 UTC] No.39352021[source]▶

>>39347045 (TP) #

Yeah, AMD look like idiots for doing this.

Either they are very stupid, or open sourcing the library stops NVidia from suing them in a repeat of the Oracle/Google lawsuit over Java APIs?

I'm not sure what the reason is?

replies(2): >>39353099 #>>39353427 #

13. llm_trw ◴[12 Feb 24 23:36 UTC] No.39352131[source]▶

>>39351568 #

Never underestimate AMD's ability to fail.

Ryzen was a surprise to everyone not because it was good, but because they didn't fuck it up within two generations.

AMD cards have more raw compute than nvidia, they are better than nvidia, yet the software is so bad that I gave up on using it and switched to nvidia. Two weeks of debugging driver errors vs 30 minutes of automated updates.

replies(1): >>39353165 #

14. polygamous_bat ◴[12 Feb 24 23:42 UTC] No.39352194{4}[source]▶

>>39351920 #

Hey stop shaming the GPU poor, not everyone is Mark Zuckerberg ordering $8bn. of GPUs.

15. Modified3019 ◴[13 Feb 24 00:59 UTC] No.39352937{3}[source]▶

>>39351685 #

That was a good decision. The amount of lamenting engineers I’ve seen over the years who’ve been given the task of trying to get more affordable AMD cards to work with enterprise functionality is nontrivial. AMD nearly borders on hostility with its silence, even if you want to throw millions at them, it’s insane.

At least Nvidia, which I fucking hate, will happily hold out their hand for cash even from individuals.

So now we’re in a hilarious situation where people from hobbyists to enterprise devs are hoping for intel to save the day.

16. paulmd ◴[13 Feb 24 01:04 UTC] No.39352979{4}[source]▶

>>39351920 #

or a rack of 3090s/4090s or quadros

(the "no datacenter" clause obviously excludes workstations, and the terms of this license cannot be applied to the open kernel driver since it's GPL'd)

17. bitbang ◴[13 Feb 24 01:19 UTC] No.39353099[source]▶

>>39352021 #

That was my immediate thought: the software is made publicity available to help add value to their product offering, but they plausable deniability in court and don't have to bear the burden of potential lawsuits or support.

18. tormeh ◴[13 Feb 24 01:27 UTC] No.39353165{3}[source]▶

>>39352131 #

It's rather shocking that with RADV, Valve (mostly) has written a better RDNA2 driver than AMD has managed for their own cards. Besides the embarrassment, AMD is leaving tons of performance and therefore market share on the table. You have to wonder wtf is going on over at AMD.

replies(1): >>39353346 #

19. exikyut ◴[13 Feb 24 01:33 UTC] No.39353215{4}[source]▶

>>39351920 #

Hey, I should go play with those workstation/server configurators now they'll have been updated to supply A100Xs and such...

20. dralley ◴[13 Feb 24 01:52 UTC] No.39353346{4}[source]▶

>>39353165 #

RADV was started by David Arlie of Red Hat, although Valve been dedicating some very significant resources over the past few years.

21. yen223 ◴[13 Feb 24 02:04 UTC] No.39353427[source]▶

>>39352021 #

I think AMD is focusing on the "inference" side of ML, which doesn't really require CUDA or similar, and is what they believe a much larger market.

Time will tell if that strategy is going to pan out. Ceding the ML "training" market entirely to Nvidia is certainly a bold move

22. HarHarVeryFunny ◴[13 Feb 24 02:20 UTC] No.39353521[source]▶

>>39351568 #

IMO the trouble is that CUDA is too low level to allow emulation without a major loss of performance, and even if there was a choice of CUDA-compatible vendors, people are ultimately going to vote with their wallets. It's not enough to be compatible - you need to be compatible while providing the same or better performance (else why not just use NVIDIA).

A better level to target compatibility would be at the framework level such as PyTorch, where the building blocks of neural networks (convolution, multi-head attention, etc, etc) are high level and abstract enough to allow flexibility in mapping them onto AMD hardware without compromising performance.

However, these frameworks are forever changing and playing continual catch-up there still wouldn't be a great place to be, especially without a large staff dedicated to the effort (writing hand-optimized kernels), which AMD don't seem to be able/willing to muster.

So, finally, perhaps the strategically best place for AMD to invest would be in compilers and software tools to allow kernels to be written in a high level language. Becoming a first class Mojo target wouldn't be a bad place to start, assuming they are not already in partnership.

replies(1): >>39355582 #

23. test6554 ◴[13 Feb 24 05:21 UTC] No.39354718[source]▶

>>39351568 #

Nvidia controls CUDA the software spec, Nvidia also controls the hardware CUDA runs on. The industry adopts CUDA standards and uses the latest features.

AMD cannot keep up with arbitrarily changing hardware and software while trying to please developers that want what was just released. They would always be a generation behind at tremendous expense.

24. up2isomorphism ◴[13 Feb 24 05:22 UTC] No.39354728{3}[source]▶

>>39351685 #

Your “lab” does not sound like a lab in the classical sense.

25. abbra ◴[13 Feb 24 06:05 UTC] No.39354982{3}[source]▶

>>39351635 #

This project relies on ROCm for all it's CUDNN magic.

26. hnfong ◴[13 Feb 24 08:03 UTC] No.39355582{3}[source]▶

>>39353521 #

> However, these frameworks are forever changing and playing continual catch-up there still wouldn't be a great place to be, especially without a large staff dedicated to the effort (writing hand-optimized kernels), which AMD don't seem to be able/willing to muster.

The situation in reality is quite actually quite bad.

Given that I have a M2 Max and no nVidia cards, I've tried enough PyTorch-based ML libraries that at some point, I basically expect them to flat out show an error saying CUDA 10.x+ is required once the dependencies are installed (eg. one of them being the bitsandbytes library -- in fairness, there's apparently some effort trying to port the code to other platforms as well).

As of today, the whole field is moving too fast that it's simply not worth it for a solo dev or even a small team to even attempt getting a non-CUDA stack up and running, especially with the other major GPU vendors not (able to?) hiring people to port the hand-optimized CUDA kernels.

Hopefully the situation will change after these couple years of frenzy, but in the time being I don't see any viable way to avoid using a CUDA stack if one is serious with getting ML stuff done.

27. theropost ◴[13 Feb 24 17:40 UTC] No.39360332[source]▶

>>39347045 (TP) #

Perhaps AMD realizes that if they released something like this in a formal capacity, they might face a barrage of lawsuits, and IP claims from Nvidia. If this is completed through an Open Source project however, in which AMD is not directly funding, Nvidia would not have many legal avenues to attack. Just an opinion.

28. ◴[13 Feb 24 17:49 UTC] No.39360463[source]▶

>>39351568 #