←back to thread

172 points marban | 5 comments | | HN request time: 0.855s | source
Show context
InTheArena ◴[] No.40051885[source]
While everyone has focused on Apple's power-efficiency on the M series chips, one thing that has been very interesting is how powerful the unified memory model (by having the memory on-package with CPU) with large bandwidth to the memory actually is. Hence a lot of people in the local LLMA community are really going after high-memory Macs.

It's great to see NPUs here with the new Ryzen cores - but I wonder how effective they will be with off-die memory versus the Apple approach.

That said, it's nothing but great to see these capabilities in something other then a expensive NVIDIA card. Local NPUs may really help with edge deploying more conferencing capabilities.

Edited - sorry, ,meant on-package.

replies(8): >>40051950 #>>40052032 #>>40052167 #>>40052857 #>>40053126 #>>40054064 #>>40054570 #>>40054743 #
1. numpad0 ◴[] No.40054064[source]
Note that while UMA is great in the sense that they allow LLM models to be run at all, M-series chips aren't faster[1] when the model fits in VRAM.

  1: screenshot from[2]: https://www.igorslab.de/wp-content/uploads/2023/06/Apple-M2-ULtra-SoC-Geekbench-5-OpenCL-Compute.jpg
  2: https://wccftech.com/apple-m2-ultra-soc-isnt-faster-than-amd-intel-last-year-desktop-cpus-50-slower-than-nvidia-rtx-4080/
replies(2): >>40054237 #>>40055676 #
2. cstejerean ◴[] No.40054237[source]
The problem is you're limited to 24 GB of VRAM unless you pay through the nose for datacenter GPUs, whereas you can get an M-series chip with 128 GB or 192 GB of unified memory.
replies(1): >>40054469 #
3. numpad0 ◴[] No.40054469[source]
Surely! The point is that they're not million times faster magic chips that makes NVIDIA bankrupt tomorrow. That's all. A laptop with up to 128GB "VRAM" is a great option, absolutely no doubt about that.
replies(1): >>40054716 #
4. john_alan ◴[] No.40054716{3}[source]
They are powerful, but I agree with you, it's nice to be able to run Goliath locally, but it's a lot slower than my 4070.
5. paulmd ◴[] No.40055676[source]
that's openCL compute, LLM models ideally should be hitting the neural accelerator, not running on generalized gpu compute shaders.