(developers.googleblog.com)

602 points emrah | 3 comments | 20 Apr 25 12:22 UTC | HN request time: 0.001s | source

Show context

holografix ◴[20 Apr 25 13:27 UTC] No.43743631[source]▶

>>43743337 (OP) #

Could 16gb vram be enough for the 27b QAT version?

replies(5): >>43743634 #>>43743704 #>>43743825 #>>43744249 #>>43756253 #

1. halflings ◴[20 Apr 25 13:28 UTC] No.43743634[source]▶

>>43743631 #

That's what the chart says yes. 14.1GB VRAM usage for the 27B model.

replies(1): >>43743678 #

2. erichocean ◴[20 Apr 25 13:34 UTC] No.43743678[source]▶

>>43743634 (TP) #

That's the VRAM required just to load the model weights.

To actually use a model, you need a context window. Realistically, you'll want a 20GB GPU or larger, depending on how many tokens you need.

replies(1): >>43743834 #

3. oezi ◴[20 Apr 25 14:05 UTC] No.43743834[source]▶

>>43743678 #

I didn't realize that the context would require such so much memory. Is this KV caches? It would seem like a big advantage if this memory requirement could be reduced.

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs