> In s1, when the LLM tries to stop thinking with "</think>", they force it to keep going by replacing it with "Wait". It’ll then begin to second guess and double check its answer. They do this to trim or extend thinking time (trimming is just abruptly inserting "</think>")
I know some are really opposed to anthropomorphizing here, but this feels eerily similar to the way humans work, ie. if you just dedicate more time to analyzing and thinking about the task, you are more likely to find a better solution
It also feels analogous to navigating a tree, the more time you have to explore the nodes, the bigger the space you'll have covered, hence higher chance of getting a more optimal solution
At the same time, if you have "better intuition" (better training?), you might be able to find a good solution faster, without needing to think too much about it
replies(1):