←back to thread

577 points simonw | 1 comments | | HN request time: 0s | source
Show context
neutronicus ◴[] No.44723714[source]
If I understand correctly, the author is managing to run this model on a laptop with 64GB of RAM?

So a home workstation with 64GB+ of RAM could get similar results?

replies(6): >>44723736 #>>44723737 #>>44723740 #>>44723824 #>>44724925 #>>44727466 #
simonw ◴[] No.44723737[source]
Only if that RAM is available to a GPU, or you're willing to tolerate extremely slow responses.

The neat thing about Apple Silicon is the system RAM is available to the GPU. On most other systems you would need ~48GB of VRAM.

replies(2): >>44724890 #>>44731242 #
1. sagarm ◴[] No.44731242[source]
LLM evaluation on GPU and CPU is memory bandwidth constrained. The highest-end Apple machines are good for this because they have ~500GBps high memory bandwidth and up to ~128GB, not just because they can share that memory with the GPU (which any iGPU does). Most consumer machines are limited to 2xDDR5 channels (~50GBps).