←back to thread

2127 points bakugo | 4 comments | | HN request time: 0.001s | source
Show context
lysace ◴[] No.43163135[source]
It's fascinating how close these companies are to each other. Some company comes up with something clever/ground-breaking and everyone else has implemented it a few weeks later.

Hard not to think of Kurzweil's Law of Accelerating Returns.

replies(4): >>43163205 #>>43163347 #>>43163364 #>>43163723 #
azinman2 ◴[] No.43163347[source]
It’s extremely unlikely that everyone is copying in a few weeks for models that themselves take many weeks if not longer to train. Great minds think alike, and everyone is influencing everyone. The history of innovation is filled with examples of similar discoveries around the same time but totally disconnected in the world. Now with the rate of publishing and the openness of the internet, you’re only bound to get even more of that.
replies(4): >>43163393 #>>43163423 #>>43165162 #>>43173752 #
1. lysace ◴[] No.43163393[source]
Isn't the reasoning thing essentially a bolt-on to existing trained models? Like basically a meta-prompt?
replies(3): >>43163427 #>>43163433 #>>43163725 #
2. pertymcpert ◴[] No.43163427[source]
Somewhat but not exactly? I think the models need to be trained to think.
3. azinman2 ◴[] No.43163433[source]
No.

DeepSeek and now related projects have shown it’s possible to add reasoning via SFT to existing models, but that’s not the same as a prompt. But if you look at R1 they do a blend of techniques to get reasoning.

For Anthropic to have a hybrid model where you can control this, it will have to be built into the model directly in its training and probably architecture as well.

If you’re a competent company filled with the best AI minds and a frontier model, you’re not just purely copying… you’re taking ideas while innovating and adapting.

4. Philpax ◴[] No.43163725[source]
The fundamental innovation is training the model to reason through reinforcement learning; you can train existing models with traces from these reasoning models to get you within the same ballpark, but taking it further requires you to do RL yourself.