(simonwillison.net)

577 points simonw | 1 comments | 29 Jul 25 13:45 UTC | HN request time: 0.284s | source

Show context

joelthelion ◴[29 Jul 25 14:58 UTC] No.44724227[source]▶

Apart from using a Mac, what can you use for inference with reasonable performance? Is a Mac the only realistic option at the moment?

replies(6): >>44724398 #>>44724419 #>>44724553 #>>44724563 #>>44724959 #>>44727049 #

regularfry ◴[29 Jul 25 15:25 UTC] No.44724563[source]▶

>>44724227 #

This one should just about fit on a box with an RTX 4090 and 64GB RAM (which is what I've got) at q4. Don't know what the performance will be yet. I'm hoping for an unsloth dynamic quant to get the most out of it.

replies(1): >>44725469 #

weberer ◴[29 Jul 25 16:36 UTC] No.44725469[source]▶

>>44724563 #

Whats important is VRAM, not system RAM. The 4090 has 16gb of VRAM so you'll be limited to smaller models at decent speeds. Of course, you can run models from system memory, but your tokens/second will be orders of magnitude slower. ARM Macs are the exception since they have unified memory, allowing high bandwidth between the GPU and the system's RAM.

replies(2): >>44729356 #>>44731634 #

1. throwaway0123_5 ◴[29 Jul 25 23:17 UTC] No.44729356[source]▶

>>44725469 #

iirc 4090s have 24GB

↑

My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)