I guess this is somewhat expected? The current frontier models probably already have exhausted most of the entropy in the training data accumulated over decades and the new training data is very sparse. And the current mainstream architectures are not capable of sophisticated searching and planning, essential aspects for generating new entropy out of thin air. o1 was an interesting attempt to tackle this problem, but we probably still have a long way to go.