Reasoning is not model improvement

Hi HN, OP here. I'd appreciate feedback from folks with deep model knowledge on a few technical claims in the essay. I want to make sure I'm getting the fundamentals right.

1. On o1's arithmetic handling: I claim that when o1 multiplies large numbers, it generates Python code rather than calculating internally. I don't have full transparency into o1's internals. Is this accurate?

2. On model stagnation: I argue that fundamental model capabilities (especially code generation) have plateaued, and that tool orchestration is masking this. Do folks with hands-on experience building/evaluating models agree?

3. On alternative architectures: I suggest graph transformers that preserve semantic meaning at the word level as one possible path forward. For those working on novel architectures - what approaches look promising? Are graph-based architectures, sparse attention, or hybrid systems actually being pursued seriously in research labs?

Would love to know your thoughts!

Reasoning model doesn't imply tool calling – those shouldn't be conflated.

Reasoning just means more implicit chain-of-thought. It can be emulated by non reasoning model by explicitly constructing prompt to perform longer step by step thought process. With reasoning models it just happens implicitly, some models allow for control over reasoning effort with special tokens. Those models are simply fine tuned to do it themselves without explicit dialogue from the user.

Tool calling happens primarily on the client side. Research/web access mode etc made available by some providers (based on tool calling that they handle themselves) is not a property of a model, can be enabled on any model.

Nothing plateaued from where I'm standing – new models are being trained, releases happen frequently with impressive integration speed. New models outperform previous ones. Models gain multi modality etc.

Regarding alternative architectures – there are new ones proposed all the time. It's not easy to verify all of them at scale. Some ideas that are extending current state of art architectures end up in frontier models - but it takes time to train so lag does exist. There are also a lot of improvements that are hidden from public by commercial companies.