Reasoning is not model improvement

(manidoraisamy.com)

60 points QueensGambit | 3 comments | 23 Oct 25 15:39 UTC | HN request time: 0.001s | source

Show context

Terr_ ◴[23 Oct 25 19:40 UTC] No.45686036[source]▶

> These are not model improvements. They're engineering workarounds for models that stopped improving.

One might characterize it as an improvement in the document-style which the model operates upon.

My favorite barely-a-metaphor is that the "AI" interaction is based on a hidden document that looks like a theater script, where characters User and Bot are having a discussion. Periodically, the make_document_longer(doc) function (the stateless LLM) is invoked to to complete more Bot lines. An orchestration layer performs the Bot lines towards the (real) user, and transcribes the (real) user's submissions into User dialogue.

Recent improvements? Still a theater-script, but:

1. Reasoning - The Bot character is a film-noir detective with a constant internal commentary, not typically "spoken" to the User character and thus not "performed" by the orchestration layer: "The case was trouble, but I needed to make rent, and to do that I had to remember it was Georgia the state, not the country."

2. Tools - There are more stage-directions, such as "Bot uses [CALCULATOR] inputting [sqrt(5)*pi] and getting [PASTE_RESULT_HERE]". Regular programs are written to parse the script, run tools, and then replace the result.

Meanwhile, the fundamental architecture and the make_document_longer(doc) haven't changed as much, hence the author's title of "not model improvement."*

replies(1): >>45688199 #

1. QueensGambit ◴[23 Oct 25 22:31 UTC] No.45688199[source]▶

>>45686036 #

Exactly. Both the theater script and code are metadata that manipulates entities: characters in a play or variables in memory. There's definitely abstract-level understanding emerging: that's why models can be trained on python, but write code in java. That could be instructions like pseudo-code or the hidden document/theater script you mentioned. That capability jump from GPT3 to o1 is real. But my point is: pure metadata manipulation has hit a ceiling or is moving at a crawling pace since o1. The breakthrough applications (like agentic AI) still depend on the underlying model's ability to generate accurate code. When that capability plateaus, all the clever orchestration on top of it plateaus too.

replies(1): >>45688536 #

2. Terr_ ◴[23 Oct 25 23:01 UTC] No.45688536[source]▶

>>45688199 (TP) #

Just to confirm, as this topic gets "very meta" with levels of indirection, it sounds like you mean the LLM appends a "fitting" document fragment like:

    This was an unusual task Bot wasn't sure how to solve directly.
    Bot decided it needed to execute a program:
      [CODE_START]foo(bar(baz())[CODE_END]
    Which resulted in 
      [CODE_RESULT_PLACEHOLDER]

This stage-direction is externally parsed, executed, and substituted, and then the LLM is called upon to generate Bot-character's next reaction.

In terms of how this could go wrong, it makes me think of a meme:

> Thinking quickly, Dave constructs a homemade megaphone, using only some string, a squirrel, and a megaphone.

replies(1): >>45691369 #

3. QueensGambit ◴[24 Oct 25 06:09 UTC] No.45691369[source]▶

>>45688536 #

Yes, my understanding is:

- finding patterns in data is memorization

- finding patterns in metadata is intelligence

- finding patterns in meta-metadata is invention

For example, if you ask someone to hang a painting in an art gallery 12 feet from the floor using a 13-foot ladder:

- a worker will use the safety rule of staying 5 feet away from the wall. This is what GPT-3 does. [1]

- an engineer will apply the Pythagorean theorem. This is what o3 does.

- Pythagoras, seeing it for the first time, will derive the theorem. GPT-5 is nowhere close to that.

This climbing up the ladder of abstraction existed even before LLMs. DeepMind's AlphaGo learned from human games. But AlphaGo Zero and AlphaZero trained entirely through self-play and began uncovering new strategies across Go, chess, and shogi. So whether it's code, a game, or pseudocode, they're all metadata operating at the same level of abstraction.

[1] The Nature of Intelligence is Meta - https://manidoraisamy.com/intelligence-is-meta.html

↑