https://www.wheresyoured.at/deep-impact/
Basically, DeepSeek is _very_ efficient at inference, and that was the whole reason why it shook the industry when it was released.
https://www.wheresyoured.at/deep-impact/
Basically, DeepSeek is _very_ efficient at inference, and that was the whole reason why it shook the industry when it was released.
We also don't know the per-token cost for OpenAI and Anthropic models, but I would be highly surprised if it was significantly more expensive than open models anyone can use and run themselves. It's not like they're also not investing in inference research.
I remember seeing lots of videos at the time explaining the details, but basically it came down to the kind of hardware-aware programming that used to be very common. (Although they took it to the next level by using undocumented behavior to their advantage.)
Seriously, that claim was always completely disingenuous
And when you're using an actual AI model to "train" (copy), it's not even a shred of nonsense to realize the prior model is a core component of the training.