Most active commenters
  • noodletheworld(3)

←back to thread

602 points emrah | 11 comments | | HN request time: 0.267s | source | bottom
1. noodletheworld ◴[] No.43743667[source]
?

Am I missing something?

These have been out for a while; if you follow the HF link you can see, for example, the 27b quant has been downloaded from HF 64,000 times over the last 10 days.

Is there something more to this, or is just a follow up blog post?

(is it just that ollama finally has partial (no images right?) support? Or something else?)

replies(3): >>43743700 #>>43743748 #>>43754518 #
2. deepsquirrelnet ◴[] No.43743700[source]
QAT “quantization aware training” means they had it quantized to 4 bits during training rather than after training in full or half precision. It’s supposedly a higher quality, but unfortunately they don’t show any comparisons between QAT and post-training quantization.
replies(1): >>43743713 #
3. noodletheworld ◴[] No.43743713[source]
I understand that, but the qat models (1) are not new uploads.

How is this more significant now than when they were uploaded 2 weeks ago?

Are we expecting new models? I don’t understand the timing. This post feels like it’s two weeks late.

[1] - https://huggingface.co/collections/google/gemma-3-qat-67ee61...

replies(2): >>43743759 #>>43743843 #
4. xnx ◴[] No.43743748[source]
The linked blog post was 2 days ago
5. llmguy ◴[] No.43743759{3}[source]
8 days is closer to 1 week then 2. And it’s a blog post, nobody owes you realtime updates.
replies(1): >>43743783 #
6. noodletheworld ◴[] No.43743783{4}[source]
https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/t...

> 17 days ago

Anywaaay...

I'm literally asking, quite honestly, if this is just an 'after the fact' update literally weeks later, that they uploaded a bunch of models, or if there is something more significant about this I'm missing.

replies(2): >>43743882 #>>43744308 #
7. simonw ◴[] No.43743843{3}[source]
The official announcement of the QAT models happened on Friday 18th, two days ago. It looks like they uploaded them to HF in advance of that announcement: https://developers.googleblog.com/en/gemma-3-quantized-aware...

The partnership with Ollama and MLX and LM Studio and llama.cpp was revealed in that announcement, which made the models a lot easier for people to use.

8. timcobb ◴[] No.43743882{5}[source]
Probably the former... I see your confusion but it's really only a couple weeks at most. The news cycle is strong in you, grasshopper :)
9. osanseviero ◴[] No.43744308{5}[source]
Hi! Omar from the Gemma team here.

Last time we only released the quantized GGUFs. Only llama.cpp users could use it (+ Ollama, but without vision).

Now, we released the unquantized checkpoints, so anyone can quantize themselves and use in their favorite tools, including Ollama with vision, MLX, LM Studio, etc. MLX folks also found that the model worked decently with 3 bits compared to naive 3-bit, so by releasing the unquantized checkpoints we allow further experimentation and research.

TL;DR. One was a release in a specific format/tool, we followed-up with a full release of artifacts that enable the community to do much more.

replies(1): >>43744374 #
10. oezi ◴[] No.43744374{6}[source]
Hey Omar, is there any chance that Gemma 3 might get a speech (ASR/AST/TTS) release?
11. Patrick_Devine ◴[] No.43754518[source]
Ollama has had vision support for Gemma3 since it came out. The implementation is not based on llama.cpp's version.