(developers.googleblog.com)

602 points emrah | 3 comments | 20 Apr 25 12:22 UTC | HN request time: 0.001s | source

Show context

behnamoh ◴[20 Apr 25 13:45 UTC] No.43743726[source]▶

This is what local LLMs need—being treated like first-class citizens by the companies that make them.

That said, the first graph is misleading about the number of H100s required to run DeepSeek r1 at FP16. The model is FP8.

replies(2): >>43744437 #>>43745837 #

1. freeamz ◴[20 Apr 25 15:33 UTC] No.43744437[source]▶

>>43743726 #

so what is the real comparison against DeepSeek r1 ? Would be good to know which is actually more cost efficient and open (reproducible build) to run locally.

replies(1): >>43744508 #

2. behnamoh ◴[20 Apr 25 15:44 UTC] No.43744508[source]▶

>>43744437 (TP) #

half the amount of those dots is what it takes. but also, why compare a 27B model with a +600B? that doesn't make sense.

replies(1): >>43745800 #

3. smallerize ◴[20 Apr 25 19:06 UTC] No.43745800[source]▶

>>43744508 #

It's an older image that they just reused for the blog post. It's on https://ai.google.dev/gemma for example

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs