←back to thread

172 points marban | 1 comments | | HN request time: 0s | source
Show context
InTheArena ◴[] No.40051885[source]
While everyone has focused on Apple's power-efficiency on the M series chips, one thing that has been very interesting is how powerful the unified memory model (by having the memory on-package with CPU) with large bandwidth to the memory actually is. Hence a lot of people in the local LLMA community are really going after high-memory Macs.

It's great to see NPUs here with the new Ryzen cores - but I wonder how effective they will be with off-die memory versus the Apple approach.

That said, it's nothing but great to see these capabilities in something other then a expensive NVIDIA card. Local NPUs may really help with edge deploying more conferencing capabilities.

Edited - sorry, ,meant on-package.

replies(8): >>40051950 #>>40052032 #>>40052167 #>>40052857 #>>40053126 #>>40054064 #>>40054570 #>>40054743 #
numpad0 ◴[] No.40054064[source]
Note that while UMA is great in the sense that they allow LLM models to be run at all, M-series chips aren't faster[1] when the model fits in VRAM.

  1: screenshot from[2]: https://www.igorslab.de/wp-content/uploads/2023/06/Apple-M2-ULtra-SoC-Geekbench-5-OpenCL-Compute.jpg
  2: https://wccftech.com/apple-m2-ultra-soc-isnt-faster-than-amd-intel-last-year-desktop-cpus-50-slower-than-nvidia-rtx-4080/
replies(2): >>40054237 #>>40055676 #
1. paulmd ◴[] No.40055676[source]
that's openCL compute, LLM models ideally should be hitting the neural accelerator, not running on generalized gpu compute shaders.