Gemma 3 QAT Models: Bringing AI to Consumer GPUs

(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

Show context

trebligdivad ◴[20 Apr 25 14:31 UTC] No.43744014[source]▶

It seems pretty impressive - I'm running it on my CPU (16 core AMD 3950x) and it's very very impressive at translation, and the image description is very impressive as well. I'm getting about 2.3token/s on it (compared to under 1/s on the Calme-3.2 I was previously using). It does tend to be a bit chatty unless you tell it not to be; pretty much everything it'll give you a 'breakdown' unless you tell it not to - so for traslation my prompt is 'Translate the input to English, only output the translation' to stop it giving a breakdown of the input language.

replies(2): >>43744070 #>>43747653 #

simonw ◴[20 Apr 25 14:39 UTC] No.43744070[source]▶

>>43744014 #

What are you using to run it? I haven't got image input working yet myself.

replies(2): >>43744122 #>>43744621 #

1. trebligdivad ◴[20 Apr 25 14:47 UTC] No.43744122[source]▶

>>43744070 #

I'm using llama.cpp - built last night from head; to do image stuff you have to run a separate client they provide, with something like:

./build/bin/llama-gemma3-cli -m /discs/fast/ai/gemma-3-27b-it-q4_0.gguf --mmproj /discs/fast/ai/mmproj-model-f16-27B.gguf -p "Describe this image." --image ~/Downloads/surprise.png

Note the 2nd gguf in there - I'm not sure, but I think that's for encoding the image.

↑