AMD GPU Debugger | slacker news

1. whalesalad ◴[08 Dec 25 18:30 UTC] No.46195861[source]▶

>>46193931 (OP) #

Tangent: is anyone using a 7900 XTX for local inference/diffusion? I finally installed Linux on my gaming pc, and about 95% of the time it is just sitting off collecting dust. I would love to put this card to work in some capacity.

replies(8): >>46196239 #>>46196248 #>>46196327 #>>46196716 #>>46197073 #>>46197542 #>>46197967 #>>46202515 #

2. qskousen ◴[08 Dec 25 19:01 UTC] No.46196239[source]▶

>>46195861 (TP) #

I've done it with a 6800XT, which should be similar. It's a little trickier than with an Nvidia card (because everything is designed for CUDA) but doable.

3. FuriouslyAdrift ◴[08 Dec 25 19:02 UTC] No.46196248[source]▶

>>46195861 (TP) #

You'd be much better off wiht any decent nVidia against the 7900 series.

AMD doesn't have a unified architecture across GPU and compute like nVidia.

AMD compute cards are sold under the Insinct line and are vastly more powerfull than their GPUs.

Supposedly, they are moving back to a unified architecture in the next generation of GPU cards.

replies(1): >>46198984 #

4. Joona ◴[08 Dec 25 19:10 UTC] No.46196327[source]▶

>>46195861 (TP) #

I tested some image and text generation models, and generally things just worked after replacing the default torch libraries with AMD's rocm variants.

5. universa1 ◴[08 Dec 25 19:46 UTC] No.46196716[source]▶

>>46195861 (TP) #

try it with ramalama[1]. worked fine here with a 7840u and a 6900xt.

[1] https://ramalama.ai/

6. Gracana ◴[08 Dec 25 20:16 UTC] No.46197073[source]▶

>>46195861 (TP) #

I bought one when they were pretty new and I had issues with rocm (iirc I was getting kernel oopses due to GPU OOMs) when running LLMs. It worked mostly fine with ComfyUI unless I tried to do especially esoteric stuff. From what I've heard lately though, it should work just fine.

7. jjmarr ◴[08 Dec 25 20:55 UTC] No.46197542[source]▶

>>46195861 (TP) #

I've been using it for a few years on Gentoo. There were challenges with Python 2 years ago, but over the past year it's stabilized and I can even do img2video which is the most difficult local inference task so far.

Performance-wise, the 7900 xtx is still the most cost effective way of getting 24 gigabytes that isn't a sketchy VRAM mod. And VRAM is the main performance barrier since any LLM is going to barely fit in memory.

Highly suggest checking out TheRock. There's been a big rearchitecting of ROCm to improve the UX/quality.

replies(1): >>46200978 #

8. veddan ◴[08 Dec 25 21:36 UTC] No.46197967[source]▶

>>46195861 (TP) #

For LLMs, I just pulled the latest llama.cpp and built it. Haven't had any issues with it. This was quite recently though, things used be a lot worse as I understand it.

9. shmerl ◴[08 Dec 25 23:12 UTC] No.46198984[source]▶

>>46196248 #

tinygrad disagrees.

replies(1): >>46199841 #

10. aystatic ◴[09 Dec 25 00:40 UTC] No.46199841{3}[source]▶

>>46198984 #

name 3 things using tinygrad that's not openpilot

11. androiddrew ◴[09 Dec 25 03:35 UTC] No.46200978[source]▶

>>46197542 #

Bought a Radeon r9700. 32GB vram and it does a good job.

12. bialpio ◴[09 Dec 25 08:14 UTC] No.46202515[source]▶

>>46195861 (TP) #

I've only played with using 7900XT for locally hosting LLMs via ollama (this is on Windows, mind you) and things worked fine - e.g. devstral:24b was decently fast. I haven't had time to use it for anything even semi-serious though so cannot comment on how useful it actually is.