(simonwillison.net)

577 points simonw | 1 comments | 29 Jul 25 13:45 UTC | HN request time: 0s | source

Show context

neutronicus ◴[29 Jul 25 14:19 UTC] No.44723714[source]▶

>>44723316 (OP) #

If I understand correctly, the author is managing to run this model on a laptop with 64GB of RAM?

So a home workstation with 64GB+ of RAM could get similar results?

replies(6): >>44723736 #>>44723737 #>>44723740 #>>44723824 #>>44724925 #>>44727466 #

simonw ◴[29 Jul 25 14:21 UTC] No.44723737[source]▶

>>44723714 #

Only if that RAM is available to a GPU, or you're willing to tolerate extremely slow responses.

The neat thing about Apple Silicon is the system RAM is available to the GPU. On most other systems you would need ~48GB of VRAM.

replies(2): >>44724890 #>>44731242 #

1. sagarm ◴[30 Jul 25 06:00 UTC] No.44731242[source]▶

>>44723737 #

LLM evaluation on GPU and CPU is memory bandwidth constrained. The highest-end Apple machines are good for this because they have ~500GBps high memory bandwidth and up to ~128GB, not just because they can share that memory with the GPU (which any iGPU does). Most consumer machines are limited to 2xDDR5 channels (~50GBps).

↑

My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)