(developers.googleblog.com)

602 points emrah | 3 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

1. api ◴[20 Apr 25 15:32 UTC] No.43744419[source]▶

When I see 32B or 70B models performing similarly to 200+B models, I don’t know what to make of this. Either the latter contains more breadth of information but we have managed to distill latent capabilities to be similar, the larger models are just less efficient, or the tests are not very good.

replies(2): >>43744582 #>>43744783 #

2. simonw ◴[20 Apr 25 15:56 UTC] No.43744582[source]▶

>>43744419 (TP) #

It makes intuitive sense to me that this would be possible, because LLMs are still mostly opaque black boxes. I expect you could drop a whole hunch of the weights without having a huge impact on quality - maybe you end up mostly ditching the parts that are derived from shitposts on Reddit but keep the bits from Arxiv for example.

(That's a massive simplification of how any of this works, but it's how I think about it at a high level.)

3. retinaros ◴[20 Apr 25 16:28 UTC] No.43744783[source]▶

>>43744419 (TP) #

its just bs benchmarks. they are all cheating at this point feeding the data in the training set. doesnt mean the llm arent becoming better but when they all lie...

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs