Most active commenters

    ←back to thread

    AMD GPU Debugger

    (thegeeko.me)
    276 points ibobev | 12 comments | | HN request time: 0.001s | source | bottom
    1. whalesalad ◴[] No.46195861[source]
    Tangent: is anyone using a 7900 XTX for local inference/diffusion? I finally installed Linux on my gaming pc, and about 95% of the time it is just sitting off collecting dust. I would love to put this card to work in some capacity.
    replies(8): >>46196239 #>>46196248 #>>46196327 #>>46196716 #>>46197073 #>>46197542 #>>46197967 #>>46202515 #
    2. qskousen ◴[] No.46196239[source]
    I've done it with a 6800XT, which should be similar. It's a little trickier than with an Nvidia card (because everything is designed for CUDA) but doable.
    3. FuriouslyAdrift ◴[] No.46196248[source]
    You'd be much better off wiht any decent nVidia against the 7900 series.

    AMD doesn't have a unified architecture across GPU and compute like nVidia.

    AMD compute cards are sold under the Insinct line and are vastly more powerfull than their GPUs.

    Supposedly, they are moving back to a unified architecture in the next generation of GPU cards.

    replies(1): >>46198984 #
    4. Joona ◴[] No.46196327[source]
    I tested some image and text generation models, and generally things just worked after replacing the default torch libraries with AMD's rocm variants.
    5. universa1 ◴[] No.46196716[source]
    try it with ramalama[1]. worked fine here with a 7840u and a 6900xt.

    [1] https://ramalama.ai/

    6. Gracana ◴[] No.46197073[source]
    I bought one when they were pretty new and I had issues with rocm (iirc I was getting kernel oopses due to GPU OOMs) when running LLMs. It worked mostly fine with ComfyUI unless I tried to do especially esoteric stuff. From what I've heard lately though, it should work just fine.
    7. jjmarr ◴[] No.46197542[source]
    I've been using it for a few years on Gentoo. There were challenges with Python 2 years ago, but over the past year it's stabilized and I can even do img2video which is the most difficult local inference task so far.

    Performance-wise, the 7900 xtx is still the most cost effective way of getting 24 gigabytes that isn't a sketchy VRAM mod. And VRAM is the main performance barrier since any LLM is going to barely fit in memory.

    Highly suggest checking out TheRock. There's been a big rearchitecting of ROCm to improve the UX/quality.

    replies(1): >>46200978 #
    8. veddan ◴[] No.46197967[source]
    For LLMs, I just pulled the latest llama.cpp and built it. Haven't had any issues with it. This was quite recently though, things used be a lot worse as I understand it.
    9. shmerl ◴[] No.46198984[source]
    tinygrad disagrees.
    replies(1): >>46199841 #
    10. aystatic ◴[] No.46199841{3}[source]
    name 3 things using tinygrad that's not openpilot
    11. androiddrew ◴[] No.46200978[source]
    Bought a Radeon r9700. 32GB vram and it does a good job.
    12. bialpio ◴[] No.46202515[source]
    I've only played with using 7900XT for locally hosting LLMs via ollama (this is on Windows, mind you) and things worked fine - e.g. devstral:24b was decently fast. I haven't had time to use it for anything even semi-serious though so cannot comment on how useful it actually is.