AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

(www.phoronix.com)

1045 points mfiguiere | 1 comments | 12 Feb 24 14:00 UTC | HN request time: 0.317s | source

Show context

Cu3PO42 ◴[12 Feb 24 16:09 UTC] No.39346489[source]▶

I'm really rooting for AMD to break the CUDA monopoly. To this end, I genuinely don't know whether a translation layer is a good thing or not. On the upside it makes the hardware much more viable instantly and will boost adoption, on the downside you run the risk that devs will never support ROCm, because you can just use the translation layer.

I think this is essentially the same situation as Proton+DXVK for Linux gaming. I think that that is a net positive for Linux, but I'm less sure about this. Getting good performance out of GPU compute requires much more tuning to the concrete architecture, which I'm afraid devs just won't do for AMD GPUs through this layer, always leaving them behind their Nvidia counterparts.

However, AMD desperately needs to do something. Story time:

On the weekend I wanted to play around with Stable Diffusion. Why pay for cloud compute, when I have a powerful GPU at home, I thought. Said GPU is a 7900 XTX, i.e. the most powerful consumer card from AMD at this time. Only very few AMD GPUs are supported by ROCm at this time, but mine is, thankfully.

So, how hard could it possibly to get Stable Diffusion running on my GPU? Hard. I don't think my problems were actually caused by AMD: I had ROCm installed and my card recognized by rocminfo in a matter of minutes. But the whole ML world is so focused on Nvidia that it took me ages to get a working installation of pytorch and friends. The InvokeAI installer, for example, asks if you want to use CUDA or ROCm, but then always installs the CUDA variant whatever you answer. Ultimately, I did get a model to load, but the software crashed my graphical session before generating a single image.

The whole experience left me frustrated and wanting to buy an Nvidia GPU again...

replies(10): >>39346714 #>>39347956 #>>39348258 #>>39349464 #>>39349658 #>>39350019 #>>39350273 #>>39351237 #>>39354496 #>>39433413 #

sophrocyne ◴[12 Feb 24 22:11 UTC] No.39351237[source]▶

>>39346489 #

Hey there -

I'm a maintainer (and CEO) of Invoke.

It's something we're monitoring as well.

ROCm has been challenging to work with - we're actively talking to AMD to keep apprised of ways we can mitigate some of the more troublesome experiences that users have with getting Invoke running on AMD (and hoping to expand official support to Windows AMD)

The problem is that a lot of the solutions proposed involve significant/unsustainable dev effort (i.e., supporting an entirely different inference paradigm), rather than "drop in" for the existing Torch/diffusers pipelines.

While I don't know enough about your set up to offer immediate solutions, if you join the discord, am sure folks would be happy to try walking through some manual troubleshooting/experimentation to get you up and running - discord.gg/invoke-ai

replies(2): >>39351457 #>>39352272 #

Cu3PO42 ◴[12 Feb 24 23:48 UTC] No.39352272[source]▶

>>39351237 #

Hi! I really appreciate you taking the time to reply.

I have since gotten Invoke to run and was already able to get some results I'm really quite happy with, so thank you for your time and commitment working on Invoke!

I understand that ROCm is still challenging, but it seems my problems were less related to ROCm or Invoke itself and more to Python dependency management. It really boiled down to getting the correct (ROCm) versions of packages installed. Installing Invoke from PyPi always removed my Torch and installed CUDA-enabled Torch (as well as cuBLAS, cuDNN, ...). Once I had the correct versions of packages, everything just worked.

To me, your pyproject.toml looks perfectly sane, so I wasn't sure how to go about fixing the problem.

What ended up working for me was to use one of AMD's ROCm OCI base images, manually installing all dependencies, foregoing a virtual environment, cloning your repo (, building the frontend), and then installing from there.

The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.) Trying to build the Dockerfile from your repo, I also ended up with a CUDA-enabled Torch. It did install the correct one first, but in a later step removed the ROCm-enabled Torch to switch it for the CUDA-enabled one.

I hope you'll consider investing some resources into publishing newer, working builds of your Docker image.

replies(3): >>39353370 #>>39353550 #>>39364975 #

sophrocyne ◴[13 Feb 24 01:54 UTC] No.39353370[source]▶

>>39352272 #

You bet - Thanks for the feedback. Glad you're enjoying Invoke!

We do have Docker packages hosted on GH, but I'll be the first to admit that we haven't prioritized ROCm. Contributors who have AMDs are a scant few, but maybe we'll find some help in wrangling that problem now that we know there's an avenue to do so.

replies(2): >>39355543 #>>39362959 #

1. Cu3PO42 ◴[13 Feb 24 21:17 UTC] No.39362959[source]▶

>>39353370 #

As promised in my other comment, I did send a PR! https://github.com/invoke-ai/InvokeAI/pull/5714

↑