> The finding that language models can get better by generating longer outputs directly contradicts Yann’s hypothesis. I think the flaw in his logic comes from the idea that errors must compound per-token. Somehow, even if the model makes a mistake, it is able to correct itself and decrease the sequence-level error rate
I don’t think current LLM behavior is necessarily due to self-correction, but more due to availability of internet-scale data, but I know that reasoning models are building towards self-correction. The problem, I think, is that even reasoning models are rote because they lack information synthesis, which in biological organisms comes from the interplay between short-term and long-term memories. I am looking forward to LLMs which surpass rote and mechanical answer and reasoning capabilities.
replies(1):