https://www.wheresyoured.at/deep-impact/
Basically, DeepSeek is _very_ efficient at inference, and that was the whole reason why it shook the industry when it was released.
https://www.wheresyoured.at/deep-impact/
Basically, DeepSeek is _very_ efficient at inference, and that was the whole reason why it shook the industry when it was released.
Given Gemini efficiency with long context I would bet their attention is very efficient too.
GPT OSS uses fp4, which DeepSeek doesn’t use yet btw.
So no, big labs aren’t behind DeepSeek in efficiency. Not by much at least.
We also don't know the per-token cost for OpenAI and Anthropic models, but I would be highly surprised if it was significantly more expensive than open models anyone can use and run themselves. It's not like they're also not investing in inference research.
I remember seeing lots of videos at the time explaining the details, but basically it came down to the kind of hardware-aware programming that used to be very common. (Although they took it to the next level by using undocumented behavior to their advantage.)
In any case, here is what Anthropic CEO Dario Amodei said about DeepSeek:
"DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)"
"DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese."
https://www.darioamodei.com/post/on-deepseek-and-export-cont...
We certainly don't have to take his word for it, but the claim is that DeepSeek's models are not much more efficient to train or inference than closed models of comparable quality. Furthermore, both Amodei and Sam Altman have recently claimed that inference is profitable:
Amodei: "If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.
What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."
https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...
Altman: "If we didn’t pay for training, we’d be a very profitable company."
https://www.theverge.com/command-line-newsletter/759897/sam-...
Seriously, that claim was always completely disingenuous
And when you're using an actual AI model to "train" (copy), it's not even a shred of nonsense to realize the prior model is a core component of the training.
The first statement is one about the present value of AI. The second statement is about their belief of the future value of AI.
"There is nothing else after generative AI. There are no other hypergrowth markets left in tech. SaaS companies are out of things to upsell. Google, Microsoft, Amazon and Meta do not have any other ways to continue showing growth, and when the market works that out, there will be hell to pay, hell that will reverberate through the valuations of, at the very least, every public software company, and many of the hardware ones too."
I am not doing some kind of sophisticated act of interpretation here. If AI is very little of big tech revenue, and big tech are posting massive record revenue and profits every quarter, then it cannot be the case that "there is nothing left after generative AI" and they “do not have any other ways to continue showing growth” — what is left is whatever is driving all that revenue and profit growth right now!