(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

Show context

noodletheworld ◴[20 Apr 25 13:33 UTC] No.43743667[source]▶

Am I missing something?

These have been out for a while; if you follow the HF link you can see, for example, the 27b quant has been downloaded from HF 64,000 times over the last 10 days.

Is there something more to this, or is just a follow up blog post?

(is it just that ollama finally has partial (no images right?) support? Or something else?)

replies(3): >>43743700 #>>43743748 #>>43754518 #

deepsquirrelnet ◴[20 Apr 25 13:38 UTC] No.43743700[source]▶

>>43743667 #

QAT “quantization aware training” means they had it quantized to 4 bits during training rather than after training in full or half precision. It’s supposedly a higher quality, but unfortunately they don’t show any comparisons between QAT and post-training quantization.

replies(1): >>43743713 #

noodletheworld ◴[20 Apr 25 13:42 UTC] No.43743713[source]▶

>>43743700 #

I understand that, but the qat models (1) are not new uploads.

How is this more significant now than when they were uploaded 2 weeks ago?

Are we expecting new models? I don’t understand the timing. This post feels like it’s two weeks late.

[1] - https://huggingface.co/collections/google/gemma-3-qat-67ee61...

replies(2): >>43743759 #>>43743843 #

1. simonw ◴[20 Apr 25 14:05 UTC] No.43743843[source]▶

>>43743713 #

The official announcement of the QAT models happened on Friday 18th, two days ago. It looks like they uploaded them to HF in advance of that announcement: https://developers.googleblog.com/en/gemma-3-quantized-aware...

The partnership with Ollama and MLX and LM Studio and llama.cpp was revealed in that announcement, which made the models a lot easier for people to use.

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs