Most active commenters
  • ijidak(3)

←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 40 comments | | HN request time: 1.043s | source | bottom
1. swiftcoder ◴[] No.42948127[source]
> having 10,000 H100s just means that you can do 625 times more experiments than s1 did

I think the ball is very much in their court to demonstrate they actually are using their massive compute in such a productive fashion. My BigTech experience would tend to suggest that frugality went out the window the day the valuation took off, and they are in fact just burning compute for little gain, because why not...

replies(5): >>42948369 #>>42948616 #>>42948712 #>>42949773 #>>42953287 #
2. whizzter ◴[] No.42948369[source]
Mainly it points to a non-scientific "bigger is better" mentality, and the researchers probably didn't mind playing around with the power because "scale" is "cool".

Remember that the Lisp AI-labs people were working on non-solved problems on absolute potatoes of computers back in the day, we have a semblance of progress solution but so much of it has been brute-force (even if there has been improvements in the field).

The big question is if these insane spendings has pulled the rug on real progress if we head into another AI winter of disillusionment or if there is enough real progress just around the corner to show that there is hope for investors in a post-deepseek valuation hangover.

replies(2): >>42948531 #>>42950004 #
3. wongarsu ◴[] No.42948531[source]
We are in a phase where costs are really coming down. We had this phase from GPT2 to about GPT4 where the key to building better models was just building bigger models and training them for longer. But since then a lot of work has gone into distillation and other techniques to make smaller models more capable.

If there is another AI winter, it will be more like the dotcom bubble: lots of important work got done in the dotcom bubble, but many of the big tech companies started from the fruits of that labor in the decade after the bubble burst

4. svantana ◴[] No.42948616[source]
Besides that, AI training (aka gradient descent) is not really an "embarrassingly parallel" problem. At some point, there are diminishing returns on adding more GPUs, even though a lot of effort is going into making it as parallel as possible.
replies(1): >>42953005 #
5. gessha ◴[] No.42948712[source]
This is pure speculation on my part but I think at some point a company's valuation became tied to how big their compute is so everybody jumped on the bandwagon.
replies(3): >>42948854 #>>42949513 #>>42951813 #
6. syntaxing ◴[] No.42948854[source]
Matt Levine tangentially talked about this during his podcast this past Friday (or was it the one before?). It was a good way to value these companies according to their compute size since those chips are very valuable. At a minimum, the chips are an asset that acts as a collateral.
replies(5): >>42949098 #>>42949373 #>>42952809 #>>42952963 #>>42953590 #
7. jxdxbx ◴[] No.42949098{3}[source]
I hear this a lot, but what the hell. It's still computer chips. They depreciate. Short supply won't last forever. Hell, GPUs burn out. It seems like using ice sculptures as collateral, and then spring comes.
replies(3): >>42949241 #>>42949424 #>>42950677 #
8. baxtr ◴[] No.42949241{4}[source]
If so wouldn’t it be the first time in history when more processing power is not used?

In my experience CPU/GPU power is used up as much as possible. Increased efficiency just leads to more demand.

replies(1): >>42951798 #
9. SecretDreams ◴[] No.42949373{3}[source]
> It was a good way to value these companies according to their compute size since those chips are very valuable.

Are they actually, though? Presently yes, but are they actually driving ROI? Or just an asset nobody really is meaningfully utilizing, but helps juice the stocks?

10. sixothree ◴[] No.42949424{4}[source]
Year over year gains in computing continue to slow. I think we keep forgetting that when talking about these things as assets. The thing controlling their value is the supply which is tightly controlled like diamonds.
replies(3): >>42949510 #>>42952694 #>>42952765 #
11. adrianN ◴[] No.42949510{5}[source]
They have a fairly limited lifetime even if progress stands still.
replies(1): >>42950318 #
12. JKCalhoun ◴[] No.42949513[source]
So, "No one was ever fired for ... buying more server infrastructure."
replies(1): >>42954343 #
13. jerf ◴[] No.42949773[source]
This claim is mathematically nonsensical. It implies a more-or-less linear relationship, that more is always better. But there's no reason to limit that to H100s. Conventional servers are, if anything, rather more established in their ability to generate value, by which I mean, however much potential AI servers may have to be more important than conventional servers that they may manifest in the future, we know how to use conventional servers to generate value now.

And thus, by this logic, every company in the world should just be buying as many servers as they can get their hands on, because More Servers = More Value.

Obviously, this is not happening. It doesn't take much analysis to start listing the many and manifold reasons why. Many of those reasons will apply to GPUs as well. Just as if everything in AWS got 10x faster, overnight, this would not create a situation where everyone suddenly starts grabbing more servers in AWS. Obviously everyone would start trimming down, even if perhaps in a few years time they'd find some way to use this burst of power such that they can use more later. This can't happen overnight, though. It would take time, and not "weeks" or "months" but "years" at scale.

Incorporating the important variable of time in the analysis, if AIs become literally hundreds of times cheaper to run, today, then it is perfectly logical that the near-term demand for the hardware to run them is also going to go way, way down. However much potential AI may have, it is fairly clear looking out at the AI landscape right now that there isn't really anyone out there unlocking vast amounts of value and sitting there wringing their hands because they just can't get more GPU compute. The GPU rush has been from fear that someone will figure out how to "really" unlock AI and then they'll be stuck without the hardware to compete.

It may be the case that vastly cheaper AI will in fact be part of unlocking that value, and that as the AI industry grows it will grow faster as a result... but that's still going to be on a multi-year time frame, not a tomorrow time frame. And all those GPUs and all those valuations are still broadly based on them being valuable real soon now, not in a few years, and all those GPU purchases are on the assumption they need them now, or on a timeframe where we can't be waiting around, rather than waiting for some rounds of exponential doublings to bring price down. The hardware curve in 5 years may be higher but the curve in the next year would be lower, and by a lot.

And, you know, who's to say we're done? I doubt there's another 100x in there, but is someone going to eke out another 2x improvement? Or a 10x improvement? Making it easier to run lots of experiments makes it much more likely for that to happen. I'm skeptical of another 10x general improvement but 10x improvements for specific, important use cases I can't rule out.

Edit: I should also point out this is an extremely common pattern in technology in general. Often the very hardest part is producing a thing that does a particular task at all. Once we have it in hand, once we can use it and learn how it operates and what its characteristic operating modes are, once we can try modifications to it in the real world and see what happens, optimizing it becomes much easier, sometimes explosively so by comparison. Taking any first iteration of a tech that is practical and then trying to straight-line demand based on it is silly, in all sorts of ways and all directions. The internal combustion engine, for example, has had a myriad of impacts on the world and certainly after various improvements many, many millions if not billions of them have been made... but any company that reacted to the first couple of cars and just went ballistic buying those first-generation internal combustion engines would have lost everything, and rather quickly.

replies(1): >>42949878 #
14. ◴[] No.42950004[source]
15. throwup238 ◴[] No.42950318{6}[source]
Last I checked AWS 1-year reserve pricing for an 8x H100 box more than pays for the capital cost of the whole box, power, and NVIDIA enterprise license, with thousands left over for profit. On demand pricing is even worse. For cloud providers these things pay for themselves quickly and print cash afterwards. Even the bargain basement $2/GPU/hour pays it off in under two years.
replies(1): >>42952661 #
16. ecocentrik ◴[] No.42950677{4}[source]
That is the wrong take. Depreciated and burned out chips are replaced and a total compute value is typically increased over time. Efficiency gains are also calculated and projected over time. Seasons are inevitable and cyclical. Spring might be here but winter is coming.
17. littlestymaar ◴[] No.42951798{5}[source]
I think you're missing the point: H100 isn't going to remain useful for a long time, would you consider Tesla or Pascal graphic cards a collateral? That's what those H100 will look like in just a few years.
replies(2): >>42952713 #>>42953289 #
18. tyfon ◴[] No.42951813[source]
I don't think you need to speculate too hard. On CNBC they are not tracking revenue, profits or technical breakthroughs, but how much the big companies are spending (on gpus). That's the metric!
replies(5): >>42951860 #>>42952948 #>>42953193 #>>42954800 #>>42955651 #
19. Mistletoe ◴[] No.42951860{3}[source]
This feels like one of those stats they show from 1929 and everyone is like “and they didn’t know they were in a bubble?”
20. sdenton4 ◴[] No.42952661{7}[source]
Labor! You need it to turn the bill of sale into a data center and keep it running. The bargain basement would be even cheaper otherwise...
21. spamizbad ◴[] No.42952694{5}[source]
> Year over year gains in computing continue to slow.

This isn't true in the AI chip space (yet). And so much of this isn't just about compute but about the memory.

replies(1): >>42953119 #
22. ijidak ◴[] No.42952713{6}[source]
Yeah, exactly! I've got some 286, 386, and 486 CPUs that I want to claim as collateral!
23. ijidak ◴[] No.42952765{5}[source]
Honestly, I don't fully understand the reason for this shortage.

Isn't it because we insist on only using the latest nodes from a single company for manufacture?

I don't understand why we can't use older process nodes to boost overall GPU making capacity.

Can't we have tiers of GPU availability?

Why is Nvidia not diversifying aggressively to Samsung and Intel no matter the process node.

Can someone explain?

I've heard packaging is also a concern, but can't you get Intel to figure that out with a large enough commitment?

replies(1): >>42956572 #
24. ijidak ◴[] No.42952809{3}[source]
I asked this elsewhere, but, I don't fully understand the reason for the critical GPU shortage.

Isn't it because NVIDIA insists on only using the latest nodes from a single company (TSMC) for manufacture?

I don't understand why we can't use older process nodes to boost overall GPU making capacity.

Can't we have tiers of GPU availability some on cutting edge nodes, others built on older Intel and Samsung nodes?

Why is Nvidia not diversifying aggressively to Samsung and Intel no matter the process node.

Can someone explain?

I've heard packaging is also a concern, but can't you get Intel to figure that out with a large enough commitment?

(Also, I know NVIDIA has some capacity on Samsung. But why not go all out, even using Global Foundries?)

25. RobotToaster ◴[] No.42952948{3}[source]
"But tulip sales keep increasing!"
26. aorloff ◴[] No.42952963{3}[source]
If you are a cloud provider renting them out

Otherwise you better keep them humming trying to find a business model because they certainly aren't getting any newer as chips

27. janalsncm ◴[] No.42953005[source]
What? It definitely is.

Data parallelism, model parallelism, parameter server to workers, MoE itself can be split up, etc.

But even if it wasn’t, you can simply parallelize training runs with slight variations in hyperparameters. That is what the article is describing.

28. eek2121 ◴[] No.42953119{6}[source]
From a per mm2 performance standpoint things absolutely have slowed considerably. Gains are primarily being eked out via process advantage (which has slowed down) and larger chips (which has an ever-shrinking limit depending on the tech used)

Chiplets have slowed the slowdown in AI, but you can see in the gaming space how much things have slowed to get an idea of what is coming for enterprise.

29. LeifCarrotson ◴[] No.42953193{3}[source]
I probably don't have to repeat it, but this is a perfect example of Goodhart's Law: when a metric is used as a target, it loses its effectiveness as a metric.

If you were a reporter who didn't necessarily understand how to value a particular algorithm or training operation, but you wanted a simple number to compare the amount of work OpenAI vs. Google vs Facebook are putting into their models, yeah, it makes sense. How many petaflops their datacenters are churning through in aggregate is probably correlated to the thing you're trying to understand. And it's probably easier to look at their financials and correlate how much they've spent on GPUs to how many petaflops of compute they need.

But when your investors are giving you more money based on how well they perceive you're doing, and their perception is not an oracle but is instead directly based on how much money you're spending... the GPUs don't actually need to do anything other than make number go up.

30. deadbabe ◴[] No.42953287[source]
For starters every employee has an H100 under their desk.
31. baxtr ◴[] No.42953289{6}[source]
Not sure I do tbh.

Any asset depreciates over time. But they usually get replaced.

My 286 was replaced by a faster 386 and that by an even faster 468.

I’m sure you see a naming pattern there.

replies(4): >>42953767 #>>42955307 #>>42955541 #>>42962882 #
32. dghlsakjg ◴[] No.42953590{3}[source]
That's a great way to value a company that is going bankrupt.

But, I'm not going to value an operating construction company based on how many shovels or excavators they own. I'm going to want to see them putting those assets to productive use.

33. kgwgk ◴[] No.42953767{7}[source]
> Any asset depreciates over time.

That's why "those chips are very valuable" is not necessarily a good way to value companies - and it isn't if they can extract the value from the chips before they become worthless.

> But they usually get replaced.

They usually produce enough income to cover depreciation so you actually have the cash to replace them.

34. genewitch ◴[] No.42954343{3}[source]
Walmart has massive, idle datacenters full of running machines doing nothing.
35. B56b ◴[] No.42954800{3}[source]
They absolutely are tracking revenues/profits on CNBC, what are you talking about?
36. littlestymaar ◴[] No.42955307{7}[source]
And that's why such assets represents only a marginal part of valuation. (And if you look at accounting, this depreciation is usually done over three years for IT hardware, and as such most of these chips have already lost half of their accounting value in the balance sheet).
37. baq ◴[] No.42955541{7}[source]
My 1070 was replaced by… nothing, I moved it from a haswell box to an alder lake box.

Given that inference time will soon be extremely valuable with agents and <thinking> models, H100s may yet be worth something in a couple years.

38. ur-whale ◴[] No.42955651{3}[source]
> but how much the big companies are spending (on gpus). That's the metric!

Burn rate based valuations!

The 2000's are back in full force!

39. nl ◴[] No.42956572{6}[source]
> Isn't it because we insist on only using the latest nodes from a single company for manufacture?

TSMC was way ahead of anyone else introducing 5nm. There's a long lead time porting a chip to a new process from a different manufacturer.

> I don't understand why we can't use older process nodes to boost overall GPU making capacity.

> Can't we have tiers of GPU availability?

NVidia do this. You can get older GPUs, but more performance is better for performance sensitive applications like training or running LLMs.

Higher performance needs better manufacturing processes.

40. mvc ◴[] No.42962882{7}[source]
> My 286 was replaced by a faster 386 and that by an even faster 468.

How much was your 286 chip worth when you bought your 486?