←back to thread

602 points emrah | 1 comments | | HN request time: 0.24s | source
Show context
diggan ◴[] No.43743644[source]
First graph is a comparison of the "Elo Score" while using "native" BF16 precision in various models, second graph is comparing VRAM usage between native BF16 precision and their QAT models, but since this method is about doing quantization while also maintaining quality, isn't the obvious graph of comparing the quality between BF16 and QAT missing? The text doesn't seem to talk about it either, yet it's basically the topic of the blog post.
replies(3): >>43743893 #>>43743928 #>>43745363 #
1. nithril ◴[] No.43743928[source]
In addition the graph "Massive VRAM Savings" graph states what looks like a tautology, reducing from 16 bits to 4 bits leads unsurprisingly to a x4 reduction in memory usage