(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

Show context

diggan ◴[20 Apr 25 13:29 UTC] No.43743644[source]▶

First graph is a comparison of the "Elo Score" while using "native" BF16 precision in various models, second graph is comparing VRAM usage between native BF16 precision and their QAT models, but since this method is about doing quantization while also maintaining quality, isn't the obvious graph of comparing the quality between BF16 and QAT missing? The text doesn't seem to talk about it either, yet it's basically the topic of the blog post.

replies(3): >>43743893 #>>43743928 #>>43745363 #

1. croemer ◴[20 Apr 25 14:14 UTC] No.43743893[source]▶

>>43743644 #

Indeed, the one thing I was looking for was Elo/performance of the quantized models, not how good the base model is. Showing how much memory is saved by quantization in a figure is a bit of an insult to the intelligence of the reader.

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs