(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0.419s | source

Show context

holografix ◴[20 Apr 25 13:27 UTC] No.43743631[source]▶

Could 16gb vram be enough for the 27b QAT version?

parched99 ◴[20 Apr 25 15:06 UTC] No.43744249[source]▶

I am only able to get the Gemma-3-27b-it-qat-Q4_0.gguf (15.6GB) to run with a 100 token context size on a 5070 ti (16GB) using llamacpp.

Prompt Tokens: 10

Time: 229.089 ms

Speed: 43.7 t/s

Generation Tokens: 41

Time: 959.412 ms

Speed: 42.7 t/s

Try one of the smaller versions. 27b is too big for your gpu

replies(1): >>43746177 #

1. parched99 ◴[20 Apr 25 20:04 UTC] No.43746177[source]▶

I'm aware. I was addressing the question being asked.

Gemma 3 QAT Models: Bringing AI to Consumer GPUs