Grok 3: Another win for the bitter lesson

1. rfoo ◴[20 Feb 25 08:23 UTC] No.43112334[source]▶

>>43111963 (OP) #

I'm pretty skeptical of that 75% on GPQA Diamond for a non-reasoning model. Hope that xAI can make Grok 3 API available next week so I can run it against some private evaluations to see if it's really this good.

Another nit-pick: I don't think DeepSeek had 50k Hopper GPUs. Maybe they have 50k now after getting the world's attention and having national-sponsored grey market back them, but that 50k number is certainly dreamed-up. During the past year DeepSeek's intern recruitment ads always just mentioned "unlimited access to 10k A100s", suggesting that they may have very limited H100/H800s, and most of their research ideas were validated on smaller models on an Ampere cluster. The 10k A100 number matches with a cluster their parent hedge fund company announced a few years ago. All in all my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.

replies(2): >>43112764 #>>43118581 #

2. kgwgk ◴[20 Feb 25 09:27 UTC] No.43112764[source]▶

>>43112334 (TP) #

> my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.

Their technical report on DeepSeek-V3 says that it "is trained on a cluster equipped with 2048 NVIDIA H800 GPUs." If they had even high-single-digit thousands of H800s they would have probably used more computing power instead of waiting almost two months.

3. riku_iki ◴[20 Feb 25 18:54 UTC] No.43118581[source]▶

>>43112334 (TP) #

> I'm pretty skeptical of that 75% on GPQA Diamond for a non-reasoning model.

could that benchmark be simply leaked to training data as many others?