AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

(www.phoronix.com)

1045 points mfiguiere | 3 comments | 12 Feb 24 14:00 UTC | HN request time: 1.116s | source

Show context

Cu3PO42 ◴[12 Feb 24 16:09 UTC] No.39346489[source]▶

I'm really rooting for AMD to break the CUDA monopoly. To this end, I genuinely don't know whether a translation layer is a good thing or not. On the upside it makes the hardware much more viable instantly and will boost adoption, on the downside you run the risk that devs will never support ROCm, because you can just use the translation layer.

I think this is essentially the same situation as Proton+DXVK for Linux gaming. I think that that is a net positive for Linux, but I'm less sure about this. Getting good performance out of GPU compute requires much more tuning to the concrete architecture, which I'm afraid devs just won't do for AMD GPUs through this layer, always leaving them behind their Nvidia counterparts.

However, AMD desperately needs to do something. Story time:

On the weekend I wanted to play around with Stable Diffusion. Why pay for cloud compute, when I have a powerful GPU at home, I thought. Said GPU is a 7900 XTX, i.e. the most powerful consumer card from AMD at this time. Only very few AMD GPUs are supported by ROCm at this time, but mine is, thankfully.

So, how hard could it possibly to get Stable Diffusion running on my GPU? Hard. I don't think my problems were actually caused by AMD: I had ROCm installed and my card recognized by rocminfo in a matter of minutes. But the whole ML world is so focused on Nvidia that it took me ages to get a working installation of pytorch and friends. The InvokeAI installer, for example, asks if you want to use CUDA or ROCm, but then always installs the CUDA variant whatever you answer. Ultimately, I did get a model to load, but the software crashed my graphical session before generating a single image.

The whole experience left me frustrated and wanting to buy an Nvidia GPU again...

replies(10): >>39346714 #>>39347956 #>>39348258 #>>39349464 #>>39349658 #>>39350019 #>>39350273 #>>39351237 #>>39354496 #>>39433413 #

bntyhntr ◴[12 Feb 24 18:13 UTC] No.39348258[source]▶

>>39346489 #

I would love to be able to have a native stable diffusion experience, my rx 580 takes 30s to generate a single image. But it does work after following https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...

I got this up and running on my windows machine in short order and I don't even know what stable diffusion is.

But again, it would be nice to have first class support to locally participate in the fun.

replies(1): >>39349523 #

1. Cu3PO42 ◴[12 Feb 24 19:53 UTC] No.39349523[source]▶

>>39348258 #

I have heard that DirectML was a somewhat easier story, but allegedly has worse performance (and obviously it's Windows only...). But I'm not entirely suprised that setup is somewhat easier on Windows, where bundling everything is an accepted approach.

With AMD's official 15GB(!) Docker image, I was now able to get the A1111 UI running. With SD 1.5 and 30 sample iterations, generating an image takes under 2s. I'm still struggling to get InvokeAI running.

replies(1): >>39352803 #

2. washadjeffmad ◴[13 Feb 24 00:46 UTC] No.39352803[source]▶

>>39349523 (TP) #

That has to include the model(s), no?

Also, nothing is easier on Windows. It's a wonder that anything works there, except for the power of recalcitrance.

Not dogging Windows users, but once your brain heals, it just can't go back.

replies(1): >>39355220 #

3. Cu3PO42 ◴[13 Feb 24 06:46 UTC] No.39355220[source]▶

>>39352803 #

It actually doesn't include the models! The image is Ubuntu with ROCm and a number of ML libraries, such as Torch, preinstalled.

> Also, nothing is easier on Windows.

As much as I, too, dislike Windows, I still have to disagree. I have encountered (proprietary) software which was much easier to get working on Windows. For example, Cisco AnyConnect with SmartCard authentication has been a nightmare for me on Linux.

↑