←back to thread

Gemma 3 QAT Models: Bringing AI to Consumer GPUs

(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

Show context

trebligdivad ◴[20 Apr 25 14:31 UTC] No.43744014[source]▶

>>43743337 (OP) #

It seems pretty impressive - I'm running it on my CPU (16 core AMD 3950x) and it's very very impressive at translation, and the image description is very impressive as well. I'm getting about 2.3token/s on it (compared to under 1/s on the Calme-3.2 I was previously using). It does tend to be a bit chatty unless you tell it not to be; pretty much everything it'll give you a 'breakdown' unless you tell it not to - so for traslation my prompt is 'Translate the input to English, only output the translation' to stop it giving a breakdown of the input language.

replies(2): >>43744070 #>>43747653 #

Havoc ◴[21 Apr 25 00:41 UTC] No.43747653[source]▶

The upcoming qwen3 series is supposed to be MoE...likely to give better tk/s on CPU

replies(1): >>43748355 #

slekker ◴[21 Apr 25 03:44 UTC] No.43748355[source]▶

What's MoE?

replies(2): >>43748381 #>>43749736 #

1. zamalek ◴[21 Apr 25 03:51 UTC] No.43748381[source]▶

Mixture of Experts. Very broadly speaking, there are a bunch of mini networks (experts) which can be independently activated.