Reasoning is not model improvement

(manidoraisamy.com)

60 points QueensGambit | 2 comments | 23 Oct 25 15:39 UTC | HN request time: 0.588s | source

Show context

QueensGambit ◴[23 Oct 25 15:39 UTC] No.45683114[source]▶

Hi HN, OP here. I'd appreciate feedback from folks with deep model knowledge on a few technical claims in the essay. I want to make sure I'm getting the fundamentals right.

1. On o1's arithmetic handling: I claim that when o1 multiplies large numbers, it generates Python code rather than calculating internally. I don't have full transparency into o1's internals. Is this accurate?

2. On model stagnation: I argue that fundamental model capabilities (especially code generation) have plateaued, and that tool orchestration is masking this. Do folks with hands-on experience building/evaluating models agree?

3. On alternative architectures: I suggest graph transformers that preserve semantic meaning at the word level as one possible path forward. For those working on novel architectures - what approaches look promising? Are graph-based architectures, sparse attention, or hybrid systems actually being pursued seriously in research labs?

Would love to know your thoughts!

replies(10): >>45686080 #>>45686164 #>>45686265 #>>45686295 #>>45686359 #>>45686379 #>>45686464 #>>45686479 #>>45686558 #>>45686559 #

XenophileJKO ◴[23 Oct 25 20:13 UTC] No.45686464[source]▶

>>45683114 #

Point 2 is 1000% not true, the models have both gotten better at the overall act of coding, but have also gotten WAY better at USING tools. This isn't tool orchestration frameworks, this is knowing how and when to use tools effectively and it is largely inside the model. I would also say this is a fundamental model capability.

This improved think->act->sense loop that they now form, exponentially increases the possible utility of the models. We are just starting to see this with gpt-5 and the 4+ series of Claude models.

replies(1): >>45686805 #

emp17344 ◴[23 Oct 25 20:40 UTC] No.45686805[source]▶

>>45686464 #

Yes, the models have gotten better at using tools because tech companies have poured an insane amount of money into improving tools and integrating them with LLMs. Is this because the models have actually improved, or because the tools and integration methods have improved? I don’t think anyone actually knows.

replies(2): >>45687139 #>>45687155 #

1. XenophileJKO ◴[23 Oct 25 21:04 UTC] No.45687139[source]▶

>>45686805 #

The models have improved. They are using "arbitrary tools" better.

replies(1): >>45687549 #

2. emp17344 ◴[23 Oct 25 21:37 UTC] No.45687549[source]▶

>>45687139 (TP) #

I don’t know what you mean, because arbitrary tools don’t integrate with LLMs in the first place. Are you referring to MCP?

↑