Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)

132 points kiyanwang | 2 comments | 20 Feb 25 07:15 UTC | HN request time: 0.399s | source

Show context

smy20011 ◴[20 Feb 25 08:06 UTC] No.43112235[source]▶

>>43111963 (OP) #

Did they? Deepseek spent about 17 months achieving SOTA results with a significantly smaller budget. While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute.

If you had $3 billion, xAI would choose to invest $2.5 billion in GPUs and $0.5 billion in talent. Deepseek, would invest $1 billion in GPUs and $2 billion in talent.

I would argue that the latter approach (Deepseek's) is more scalable. It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.

replies(10): >>43112269 #>>43112330 #>>43112430 #>>43112606 #>>43112625 #>>43112895 #>>43112963 #>>43115065 #>>43116618 #>>43123381 #

wordofx ◴[20 Feb 25 09:06 UTC] No.43112625[source]▶

>>43112235 #

Deepseek was a crypto mining operation before they pivoted to AI. They have an insane amount of GPUs laying around. So we have no idea how much compute they have compared to xAI.

replies(2): >>43114096 #>>43116339 #

1. miki123211 ◴[20 Feb 25 12:55 UTC] No.43114096[source]▶

>>43112625 #

Crypto GPUs have nothing to do with AI GPUs.

Crypto mining is an embarassingly parallel problem, requiring little to no communication between GPUs. To a first approximation, in crypto, 10x-ing the amount of "cores" per GPU, 10x-ing the number of GPUs per rig and 10X-ing the number of rigs you own is basically equivalent. An infinite amount of extremely slow GPUs would do just as well as one infinitely fast GPU. This is why consumer GPUs are great for crypto.

AI is the opposite. In AI, you need extremely fast communication between GPUs. This means getting as much memory per GPU as possible (to make communication less necessary), and putting all the GPUs all in one datacenter.

Consumer GPUs, which were used for crypto, don't support the fast communication technologies needed for AI training, and they don't come in the 80gb memory versions that AI labs need. This is Nvidia's price differentiation strategy.

replies(1): >>43114483 #

2. miohtama ◴[20 Feb 25 13:39 UTC] No.43114483[source]▶

>>43114096 (TP) #

Any relevant crypto has been not mined on GPUs for a long time.

But a point was made to make it less parallel. For example, Ethereum uses DAG, making requirement to have 1 GB RAM, GPU was not enough.

https://ethereum.stackexchange.com/questions/1993/what-actua...

Also any GPUs are now several generations old, so their FLOPS/watt is likely irrelevant.

↑