AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

(www.phoronix.com)

1045 points mfiguiere | 2 comments | 12 Feb 24 14:00 UTC | HN request time: 0.59s | source

Show context

btown ◴[12 Feb 24 14:37 UTC] No.39345221[source]▶

Why would this not be AMD’s top priority among priorities? Someone recently likened the situation to an Iron Age where NVIDIA owns all the iron. And this sounds like AMD knowing about a new source of ore and not even being willing to sink a single engineer’s salary into exploration.

My only guess is they have a parallel skunkworks working on the same thing, but in a way that they can keep it closed-source - that this was a hedge they think they no longer need, and they are missing the forest for the trees on the benefits of cross-pollination and open source ethos to their business.

replies(14): >>39345241 #>>39345302 #>>39345393 #>>39345400 #>>39345458 #>>39345853 #>>39345857 #>>39345893 #>>39346210 #>>39346792 #>>39346857 #>>39347433 #>>39347900 #>>39347927 #

hjabird ◴[12 Feb 24 15:26 UTC] No.39345853[source]▶

>>39345221 #

The problem with effectively supporting CUDA is that encourages CUDA adoption all the more strongly. Meanwhile, AMD will always be playing catch-up, forever having to patch issues, work around Nvidia/AMD differences, and accept the performance penalty that comes from having code optimised for another vendor's hardware. AMD needs to encourage developers to use their own ecosystem or an open standard.

replies(13): >>39345944 #>>39346147 #>>39346166 #>>39346182 #>>39346270 #>>39346295 #>>39346339 #>>39346835 #>>39346941 #>>39346971 #>>39347964 #>>39348398 #>>39351785 #

bick_nyers ◴[12 Feb 24 16:39 UTC] No.39346941[source]▶

>>39345853 #

The latest version of CUDA is 12.3, and version 12.2 came out 6 months prior. How many people are running an older version of CUDA right now on NVIDIA hardware for whatever particular reason?

Even if AMD lagged support on CUDA versioning, I think it would be widely accepted if the performance per dollar at certain price points was better.

Taking the whole market from NVIDIA is not really an option, it's better to attack certain price points and niches and then expand from there. The CUDA ship sailed a long time ago in my view.

replies(3): >>39347633 #>>39348092 #>>39348793 #

swozey ◴[12 Feb 24 17:27 UTC] No.39347633[source]▶

>>39346941 #

I just went through this this weekend - If you're running in Windows and want to use deepspeed, you have to still use Cuda 12.1 because deepspeed 13.1 is the latest that works with 12.1. There's no deepspeed for windows that works with 12.3.

I tried to get it working this weekend but it was a huge PITA so I switched to putting everything into WSL2 then in arch on there pytorch etc in containers so I could flip versions easily now that I know how SPECIFIC the versions are to one another.

I'm still working on that part, halfway into it my WSL2 completely broke and I had to reinstall windows. I'm scared to mount the vhdx right now. I did ALL of my work and ALL of my documentation is inside of the WSL2 archlinux and NOT on my windows machine. I have EVERYTHING I need to quickly put another server up (dotfiles, configs) sitting in a chezmoi git repo ON THE VM. That I only git committed one init like 5 mins into everything. THAT was a learning experience, now I have no idea if I should follow the "best practice" of keeping projects in wsl or having wsl reach out to windows, there's a performance drop. The 9p networking stopped working and no matter what I reinstalled, reset, removed features, reset windows, etc, it wouldn't start. But at least I have that WSL2 .vhdx image that will hopefully mount and start. And probably break WSL2 again. I even SPECIFICALLY took backups of the image as tarballs every hour in case I broke LINUX, not WSL.

If anyone has done sd containers in wsl2 already let me know. I've tried to use WSL for dev work (i use osx) like this 2-3 times in the last 4-5 years and I always run into some catastrophically broken thing that makes my WSL stop working. I hadn't used it in years so hoped it was super reliable by now. This is on 3 different desktops with completely different hardware, etc. I was terrified it would break this weekend and IT DID. At least I can be up in windows in 20 minutes thanks to chocolately and chezmoi. Wiped out my entire gaming desktop.

Sorry I'm venting now this was my entire weekend.

This repo is from a deepspeed contrib (iirc) and lists the reqs for deepspeed + windows that mention the version matches

https://github.com/S95Sedan/Deepspeed-Windows

> conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

It may sound weird to do any of this in Windows, or maybe not, but if it does just remember that it's a lot of gamers like me with 4090s who just want to learn ML stuff as a hobby. I have absolutely no idea what I'm doing but thank god I know containers and linux like the back of my hand.

replies(3): >>39347855 #>>39362476 #>>39395549 #

bick_nyers ◴[12 Feb 24 17:43 UTC] No.39347855[source]▶

>>39347633 #

Vent away! Sounds frustrating for sure.

As much as I love Microsoft/Windows for the work they have put into WSL, I ended up just putting Kubuntu on my devices and use QEMU with GPU passthrough whenever I need Windows. Gaming perf is good. You need an iGPU or a cheap second GPU for Linux in order to hand off a 4090 etc. to Windows (unless maybe your motherboard happens to support headless boot but if it's a consumer board it doesn't). Dual boot with Windows always gave me trouble.

replies(2): >>39348016 #>>39348602 #

katbyte ◴[12 Feb 24 17:54 UTC] No.39348016[source]▶

>>39347855 #

I recently gave this a go as I’d not had a windows desktop for a long time, have a beefy Proxmox server and wanted to play some windows only games - works shockingly well with an a4000 and 35m optical hdmi cables! - however I’m getting random audio crackling and popping and I’ve yet to figure out what’s causing it.

First I thought it was hardware related in a Remote Desktop session leading me to think some weird audio driver thing

have you encountered anything like this at all?

replies(1): >>39348652 #

swozey ◴[12 Feb 24 18:45 UTC] No.39348652[source]▶

>>39348016 #

What are you running for audio? pipewire+jack, pipewire, jack2, pulseaudio? I wonder if it's from latency. Pulseaudio is the most common but if you do any audio engineering or play guitar etc with your machine we all use jack protocol for less latency.

https://linuxmusicians.com/viewtopic.php?t=25556

Could be completely unrelated though, RDP sessions can definitely act up, get audio out of sync etc. I try to never do pass through rdp audio, it's not even enabled by default in the mstsc client IIRC but that may just be a "probably server" thing.

replies(1): >>39358625 #

1. katbyte ◴[13 Feb 24 15:42 UTC] No.39358625[source]▶

>>39348652 #

I have tried Optical usb cable to kvm to dac, audio over hdmi, and audio over rdp. All have the same crackle

replies(1): >>39360488 #

2. swozey ◴[13 Feb 24 17:51 UTC] No.39360488[source]▶

>>39358625 (TP) #

Oh it's every single rpd connection? That's definitely not normal to rdp at all. I used to be a windows engineer so I RDP'd a LOT. RDP was our ssh, lol.

Crackle would happen so rarely that I KNOW it definitely happened but it wasn't like a 2 day thing it was probably like, once in a year or 6 months, etc.

↑