Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)

132 points kiyanwang | 4 comments | 20 Feb 25 07:15 UTC | HN request time: 0.8s | source

Show context

smy20011 ◴[20 Feb 25 08:06 UTC] No.43112235[source]▶

>>43111963 (OP) #

Did they? Deepseek spent about 17 months achieving SOTA results with a significantly smaller budget. While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute.

If you had $3 billion, xAI would choose to invest $2.5 billion in GPUs and $0.5 billion in talent. Deepseek, would invest $1 billion in GPUs and $2 billion in talent.

I would argue that the latter approach (Deepseek's) is more scalable. It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.

replies(10): >>43112269 #>>43112330 #>>43112430 #>>43112606 #>>43112625 #>>43112895 #>>43112963 #>>43115065 #>>43116618 #>>43123381 #

sigmoid10 ◴[20 Feb 25 08:12 UTC] No.43112269[source]▶

>>43112235 #

>It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.

The article explains how in reality the opposite is true. Especially when you look at it long term. Compute power grows exponentially, humans do not.

replies(4): >>43112294 #>>43112314 #>>43112378 #>>43112615 #

OtherShrezzing ◴[20 Feb 25 08:28 UTC] No.43112378[source]▶

>>43112269 #

Humans don't grow exponentially indefinitely. But there's only something in the order of 100k AI researchers employed in the big labs right now. Meanwhile, there's around 20mn software engineers globally, and around 200k math graduates per year.

The number of humans who could feasibly work on this problem is pretty high, and the labs could grow an order of magnitude, and still only be tapping into the top 1-2% of engineers & mathematicians. They could grow two orders of magnitude before they've absorbed all of the above-average engineers & mathematicians in the world.

replies(1): >>43112418 #

sigmoid10 ◴[20 Feb 25 08:33 UTC] No.43112418[source]▶

>>43112378 #

I'd actually say the market is stretched pretty thin by now. I've been an AI researcher for a decade and what passes as AI researcher or engineer these days is borderline worthless. You can get a lot of people who can use scripts and middleware like frontend lego sets to build things, but I'd say there are less than 1k people in the world right now who can actually meaningfully improve algorithmic design. There are a lot more people out there who do systems design and cloud ops, so only when you choose to go for scaling, you'll find a plentiful set of human brainpower.

replies(1): >>43112879 #

1. llm_trw ◴[20 Feb 25 09:48 UTC] No.43112879[source]▶

>>43112418 #

Do you know what places people who are interested in research congregate at? Every forum, meet up or journal gets overwhelmed by bullshit with a year of being good.

replies(1): >>43120770 #

2. sigmoid10 ◴[20 Feb 25 21:53 UTC] No.43120770[source]▶

>>43112879 (TP) #

Universities (at least certain ones) and startups (more in absolute terms than universities, but there's also a much bigger fraction of swindlers). Most blogs and forums are garbage. If you're not inside these ecosystems, try to find out who the smart/talented people are by reading influential papers. Then you can start following them on X, linkedin etc. and often you'll see what they're up to next. For example, there's a pretty clear research paper and hiring trail of certain people that eventually led to GPT-4, even though OpenAI never published anything on the architecture.

replies(1): >>43120865 #

3. llm_trw ◴[20 Feb 25 22:02 UTC] No.43120865[source]▶

>>43120770 #

I am in correspondence with a number of worth while authors, it's just that there isn't any place where they congregate in the (semi) open and without the weirdos who do stuff with the models you're missing out on a lot.

My favorite example I can never share in polite company is that the (still sota) best image segmentation algorithm I ever saw was done by a guy labeling parts of the vagina for his stable diffusion fine tune pipeline. I used what he'd done as the basis for a (also sota 2 years later) document segmentation model.

Found him on a subreddit about stable diffusion that's now completely overrun by shitesters and he's been banned (of course).

replies(1): >>43125935 #

4. sigmoid10 ◴[21 Feb 25 10:16 UTC] No.43125935{3}[source]▶

>>43120865 #

It's pretty easy nowadays to come up with a narrow domain SOTA in image tasks. All you need to do is label some pictures and do a bit of hyperparameter search. This can literally be done by high schoolers on a laptop. And that's exactly what they do in those subreddits where everyone primarily cares about creating explicit content. The real frontier for algorithmic development is large domains (which need a lot more data by default as well). But there actually are some big-game explicit content platforms engaged in research in this area and they have shown somewhat interesting results.

↑