←back to thread

Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)
129 points kiyanwang | 3 comments | | HN request time: 0.431s | source
1. rfoo ◴[] No.43112334[source]
I'm pretty skeptical of that 75% on GPQA Diamond for a non-reasoning model. Hope that xAI can make Grok 3 API available next week so I can run it against some private evaluations to see if it's really this good.

Another nit-pick: I don't think DeepSeek had 50k Hopper GPUs. Maybe they have 50k now after getting the world's attention and having national-sponsored grey market back them, but that 50k number is certainly dreamed-up. During the past year DeepSeek's intern recruitment ads always just mentioned "unlimited access to 10k A100s", suggesting that they may have very limited H100/H800s, and most of their research ideas were validated on smaller models on an Ampere cluster. The 10k A100 number matches with a cluster their parent hedge fund company announced a few years ago. All in all my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.

replies(2): >>43112764 #>>43118581 #
2. kgwgk ◴[] No.43112764[source]
> my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.

Their technical report on DeepSeek-V3 says that it "is trained on a cluster equipped with 2048 NVIDIA H800 GPUs." If they had even high-single-digit thousands of H800s they would have probably used more computing power instead of waiting almost two months.

3. riku_iki ◴[] No.43118581[source]
> I'm pretty skeptical of that 75% on GPQA Diamond for a non-reasoning model.

could that benchmark be simply leaked to training data as many others?