(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

Show context

diggan ◴[20 Apr 25 13:29 UTC] No.43743644[source]▶

First graph is a comparison of the "Elo Score" while using "native" BF16 precision in various models, second graph is comparing VRAM usage between native BF16 precision and their QAT models, but since this method is about doing quantization while also maintaining quality, isn't the obvious graph of comparing the quality between BF16 and QAT missing? The text doesn't seem to talk about it either, yet it's basically the topic of the blog post.

replies(3): >>43743893 #>>43743928 #>>43745363 #

1. nithril ◴[20 Apr 25 14:19 UTC] No.43743928[source]▶

>>43743644 #

In addition the graph "Massive VRAM Savings" graph states what looks like a tautology, reducing from 16 bits to 4 bits leads unsurprisingly to a x4 reduction in memory usage

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs