←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 3 comments | | HN request time: 0.635s | source
1. janalsncm ◴[] No.42953398[source]
I think a lot of people in the ML community were excited for Noam Brown to lead the O series at OpenAI because intuitively, a lot of reasoning problems are highly nonlinear i.e. they have a tree-like structure. So some kind of MCTS would work well. O1/O3 don’t seem to use this, and DeepSeek explicitly mentioned difficulties training such a model.

However, I think this is coming. DeepSeek mentioned it was hard to learn a value model for MCTS from scratch, but this doesn’t mean we couldn’t seed it with some annotated data.

replies(1): >>42953965 #
2. insane-c0der ◴[] No.42953965[source]
Do you have a reference for us to check? - "DeepSeek explicitly mentioned difficulties training such a model."
replies(1): >>42954882 #
3. janalsncm ◴[] No.42954882[source]
Section 4.2: Unsuccessful attempts

https://arxiv.org/pdf/2501.12948