This is often the case but does not _have_ to be so. LLMs can use chain of thought to “talk out loud” and “do the work”. It can use supplementary documents and iterate on its work. The quality of course varies, but it is getting better. When I read Gemini 2.5’s “thinking” notes, it indeed can build up text that is not directly present in its training data.
Putting aside anthropocentric definitions of “reasoning” and “consciousness” are key to how I think about the issues here. I’m intentionally steering completely clear of consciousness.
Modern SOTA LLMs are indeed getting better at what people call “reasoning”. We don’t need to quibble over defining some quality bar; that is probably context-dependent and maybe even arbitrary.
It is clear LLMs are doing better at “reasoning” — I’m using quotes to emphasize that (to me) it doesn’t matter if their inner mechanisms for doing reasoning don’t look like human mechanisms. Instead, run experiments and look at the results.
We’re not talking about the hard problem of consciousness, we’re talking about something that can indeed be measured: roughly speaking, the ability to derive new truths from existing ones.
(Because this topic is charged and easily misunderstood, let me clarify some questions that I’m not commenting on here: How far can the transformer-based model take us? Are data and power hungry AI models cost-effective? What viable business plans exist? How much short-term risk, to say, employment and cybersecurity? How much long-term risk to human values, security, thriving, and self-determination?)
Even if you disagree with parts of my characterization above, hear this: We should at least be honest to ourselves when we move the goal posts.
Don’t mistake my tone for zealotry. I’m open to careful criticism. If you do, please don’t try to lump me into one “side” on the topic of AI — whether it be market conditions, commercialization, safety, or research priorities — you probably don’t know me well enough to do that (yet). Apologies for the pre-defensive posture; but the convos here are often … fraught, so I’m trying to head off some of the usual styles of reply.