(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0.2s | source

Show context

holografix ◴[20 Apr 25 13:27 UTC] No.43743631[source]▶

Could 16gb vram be enough for the 27b QAT version?

replies(5): >>43743634 #>>43743704 #>>43743825 #>>43744249 #>>43756253 #

hskalin ◴[20 Apr 25 14:02 UTC] No.43743825[source]▶

With ollama you could offload a few layers to cpu if they don't fit in the VRAM. This will cost some performance ofcourse but it's much better than the alternative (everything on cpu)

replies(2): >>43744666 #>>43752342 #

1. dockerd ◴[21 Apr 25 14:22 UTC] No.43752342[source]▶

>>43743825 #

Does it work on LM Studio? Loading 27b-it-qat taking up more than 22GB on 24GB mac.

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs