Reasoning is not model improvement

Hi HN, OP here. I'd appreciate feedback from folks with deep model knowledge on a few technical claims in the essay. I want to make sure I'm getting the fundamentals right.

1. On o1's arithmetic handling: I claim that when o1 multiplies large numbers, it generates Python code rather than calculating internally. I don't have full transparency into o1's internals. Is this accurate?

2. On model stagnation: I argue that fundamental model capabilities (especially code generation) have plateaued, and that tool orchestration is masking this. Do folks with hands-on experience building/evaluating models agree?

3. On alternative architectures: I suggest graph transformers that preserve semantic meaning at the word level as one possible path forward. For those working on novel architectures - what approaches look promising? Are graph-based architectures, sparse attention, or hybrid systems actually being pursued seriously in research labs?

Would love to know your thoughts!

Not affiliated with anyone, but I think the likes of OptNet (differentiable constraint optimization) are soon going to play a role in developing AI with precise deductive reasoning.

More broadly I think what we’re looking for at the end of the day, AGI, is going come about from a diaspora of methods capturing the diverse aspects of what we recognize as intelligence. ‘Precise deductive reasoning’ is one capability out of many. Attention isn’t all you need, neither is compression, convex programming, what have you. The perceived “smoothness” or “unity” of our intelligence is an illusion like virtual memory hiding cache, and building it is going to look a lot more like stitching these capabilities together than deriving some deep and elegant equation.