I've seen research that shows that starting with reasoning models, and fine-tuning to slowly remove the reasoning steps, allows you to bake the reasoning directly into the model weights in a strong sense. Here's a recent example, and you can see the digits get baked into a pentagonal prism in the weights, allowing accurate multi-digit multiplication without needing notes: https://arxiv.org/abs/2510.00184. So, reasoning and tool use could be the first step, to collect a ton of training data to do something like this fine-tuning process.
replies(3):