Most active commenters
  • Rinzler89(3)

←back to thread

172 points marban | 15 comments | | HN request time: 1.454s | source | bottom
1. Aissen ◴[] No.40052746[source]
A quick search into it shows that this Ryzen AI NPU's support isn't integrated into upstream inference frameworks yet — so right now it's just useless silicon surface you pay for :-/
replies(3): >>40052844 #>>40053100 #>>40060474 #
2. Rinzler89 ◴[] No.40052844[source]
Some AMD laptops haven't even yet enabled the NPU in firmware even on the 7000 series wich are about a year old. Meaning it's still useless.

I was kinda bummed out they released the 8000 series after I just bought a laptop with 7000 series, but I think I actually dodged a bullet here since it doesn't look like much of an upgrade and the AI silicone screams of very early first gen product to me, as if they rushed it out the door because everyone else was doing "AI" and they needed to also cash in on the hype, kinda like the first gen RTX cards.

I think by the time I'll actually upgrade, the AI/NPU tech would have matured considerably and actually be useful.

replies(2): >>40055719 #>>40055799 #
3. dhruvdh ◴[] No.40053100[source]
There is a VitisAI execution provider for ONNX, and you can use ONNX backends for inference frameworks that support it. More info here - https://ryzenai.docs.amd.com/en/latest/

But regardless, 16 TOPs is no good for LLMs. Though there is a Ryzen AI demo that shows Llama 7B running on these at 8 tokens/sec. A sub-par experience for a sub-par LLM.

replies(3): >>40054182 #>>40054664 #>>40142456 #
4. Aissen ◴[] No.40054182[source]
Thanks, I was looking for information on this, it seems to be lower speed than pure-CPU inference on M2, and probably much worse than a ROCm GPU-based solution?
replies(1): >>40055622 #
5. markdog12 ◴[] No.40054664[source]
Wow, that's simply embarrassing.
6. p_l ◴[] No.40055622{3}[source]
Because the NPU isn't for high-end inferencing. It's a relatively small coprocessor that is supposed to do bunch of tasks with high TOPS/watt without engaging the way more power hungry GPU.

At release time, the windows driver for example included few video processing offloads used by Windows Frameworks used for example by MS Teams for background removal - so that such tasks use less battery on laptops and free up CPU/GPU for other tasks on desktop.

For higher end processing you can use the same AIE-ML coprocessors various chips available previously from Xilinx and now under AMD brand.

replies(1): >>40055788 #
7. robocat ◴[] No.40055719[source]
Does anyone have any mental heuristics for judging how "useless" a feature is?

Over decades I have a growing antipathy towards products with too many features. Especially new versions/models where the vaunted features of the previous version/model seem to never have been used by anyone.

replies(1): >>40056186 #
8. fpgamlirfanboy ◴[] No.40055788{4}[source]
> the same AIE-ML coprocessors

they're not the same - versal acaps (whatever you want to call them) have AIE1 arch while phoenix has AIE2 arch. there are significant differences between the two arches (local memory, bfloat16, etc.)

replies(1): >>40056288 #
9. sva_ ◴[] No.40055799[source]
> Some AMD laptops haven't even yet enabled the NPU in firmware

This is entirely the fault of the OEMs though, not AMD. It is activated on mine for example. But pretty much unusable under Linux at the moment (unless you're willing to run a custom kernel for it[0].)

0. https://github.com/amd/xdna-driver

replies(1): >>40056110 #
10. Rinzler89 ◴[] No.40056110{3}[source]
>This is entirely the fault of the OEMs though, not AMD.

Not true. AMD can demand how OEMs integrate and use their chips in their products as part of the sales agreement, same how Nvidia does.

AMD could have said to every system integrator buying 7000 series chips and up, that the NPU must be active in the final product.

So if the end products suck, AMD bares most of the blame for not ensuring a minimum level of QA with its integrators who release half-assed stuff since it all reflects poorly on them in the end. It's one of the reason why Nvidia keeps such a tight grip over its integrators on how their chips are to used.

11. Rinzler89 ◴[] No.40056186{3}[source]
>Does anyone have any mental heuristics for judging how "useless" a feature is?

My favorite example is the story I got to live through of the first generations of consumer 64 bit CPUs.

When the first AMD Athlon 64 came out, everyone I knew was buying them because they though they were getting something totally future proof by jumping early on the 64 bit bandwagon, in 2003, when nobody yet had 4GB+ of RAM and neither Windows nor any software would see 64bit releases till several years later when Vista came out which everyone avoided and staid on Windows XP 32bit waiting for Windows 7.

And by the time RAM sizes over 4GB and 64 bit software became even remotely mainstream, we already had dual- and quad-core CPUs miles ahead of those original 64 bit CPUs which were now obsolete (tech progress back then was wild).

So just like how 64bit silicone was a useless feature on consumer CPUs, and like the first GPUs with raytracing, I feel like now we're in the same boat with AI silicone in PCs, no much SW support for them and when it does come, these early chips will be obsolete. It's the price of being an early adopter.

replies(1): >>40061793 #
12. p_l ◴[] No.40056288{5}[source]
Phoenix has AIE-ML (what you call AIE2), Versal has choice of AIE (AIE1) and AIE-ML (AIE2) depending on chip you buy.

Essentially, AMD is making two tile designs optimized for slightly different computations and claims that they are going to offer both in Versal, but NPUs use exclusively the ML-optimized ones.

13. ◴[] No.40060474[source]
14. nercury ◴[] No.40061793{4}[source]
If AMD did not come up with 64-bit extension, we would be saying goodbye to x86 architecture.
15. imtringued ◴[] No.40142456[source]
In the benchmark you have linked, you clearly see that the performance of the CPU only implementation and the NPU implementation are identical.

https://github.com/amd/RyzenAI-SW/blob/main/example/transfor...

What this should tell you is that "15 TOPs" is an irrelevant number in this benchmark. There are exactly two FLOPs per parameter. Loading the parameters takes more time than processing them.

There are people with less than 8GB of VRAM and they can't load these models into their GPU and end up with the exact same performance as on CPU. The 12tflops of the 3060 Ti 8GB are "no good" for LLMs, because the bottleneck for token generation is memory bandwidth.

My Ryzen 2700 gets 7 tokens per second at 50 GFLOPs. What does this tell you? The NPU can saturate the memory bandwidth of the system.

Now here is the gotcha: Have you tried inputting very large prompts? Because that is where the speedup is going to be extremely noticeable. Instead of waiting minutes on a 2000 token prompt, it will be just as fast as on GPUs, because the initial prompt processing is compute bound.

Also, before calling something subpar, you're going to have to tell me how you are going to put larger models like Goliath 70b or 120b models on your GPU.