←back to thread

602 points emrah | 3 comments | | HN request time: 0.002s | source
Show context
holografix ◴[] No.43743631[source]
Could 16gb vram be enough for the 27b QAT version?
replies(5): >>43743634 #>>43743704 #>>43743825 #>>43744249 #>>43756253 #
1. halflings ◴[] No.43743634[source]
That's what the chart says yes. 14.1GB VRAM usage for the 27B model.
replies(1): >>43743678 #
2. erichocean ◴[] No.43743678[source]
That's the VRAM required just to load the model weights.

To actually use a model, you need a context window. Realistically, you'll want a 20GB GPU or larger, depending on how many tokens you need.

replies(1): >>43743834 #
3. oezi ◴[] No.43743834[source]
I didn't realize that the context would require such so much memory. Is this KV caches? It would seem like a big advantage if this memory requirement could be reduced.