←back to thread

141 points zdw | 1 comments | | HN request time: 0.001s | source
Show context
andrewstuart ◴[] No.45665124[source]
Despite this APU being deeply interesting to people who want to do local AI, anecdotally I hear that it’s hard to get models to run on it.

Why would AMD not have focused everything it possibly has on demonstrating and documenting and fixing and showing and smoothing the path for AI on their systems?

Why does AMD come across as so generally clueless when it comes to giving developers what they want, compared to Nvidia?

AMD should do whatever it takes to avoid these sort of situations:

https://youtu.be/cF4fx4T3Voc?si=wVmYmWVIya4DQ8Ut

replies(10): >>45665138 #>>45665148 #>>45665186 #>>45665215 #>>45665736 #>>45665755 #>>45665858 #>>45665962 #>>45667229 #>>45671834 #
pella ◴[] No.45665148[source]
"The AMD Ryzen™ AI Max+ processor is the first (and only) Windows AI PC processor capable of running large language models up to 235 Billion parameters in size. This includes support for popular models such as: Open AI's GPT-OSS 120B and Z.ai Org's GLM 4.5 Air. The large unified memory pool also allows models (up to 128 Billion parameters) to run at their maximum context length (which is a memory intensive feature) - enabling and empowering use cases involving tool-calling, MCP and agentic workflows - all available today. "

  GPT-OSS 120B MXFP4              : up to 44 tk/s
  GPT-OSS 20B MXFP4               : up to 62 tk/s
  Qwen3 235B A22B Thinking Q3 K L : up to 14 tk/s
  Qwen3 Coder 30B A3B Q4 K M      : up to 66 tk/s
  GLM 4.5 Air Q4 K M              : up to 16 tk/s
(performance tk/s ) : https://www.amd.com/en/blogs/2025/amd-ryzen-ai-max-personal-...
replies(2): >>45665211 #>>45668203 #
storus ◴[] No.45668203[source]
Strix Halo can only allocate 96GB RAM to the GPU. So GPT-OSS 120B can be ran only at Q6 at best (but activations would need to be partially stored in the CPU mem then).
replies(3): >>45668309 #>>45669073 #>>45670029 #
1. ondra ◴[] No.45668309[source]
GPT-OSS 120B uses native 4 bit representation, so it fits fine.