Signs of introspection in large language models

1. ninetyninenine ◴[01 Nov 25 03:41 UTC] No.45779082[source]▶

>>45762064 (OP) #

Who still thinks LLMs are stochastic parrots and an absolute dead end to AI?

replies(1): >>45780181 #

2. baq ◴[01 Nov 25 08:48 UTC] No.45780181[source]▶

>>45779082 (TP) #

A dead end is still useful.

I shudder to think what comes next, though. These things are unreasonably effective for what they are.

replies(1): >>45781103 #

3. ninetyninenine ◴[01 Nov 25 12:24 UTC] No.45781103[source]▶

>>45780181 #

Nah no one can say this. Especially given the fact this very article has stated we don’t know or understand what’s going on but we see glimmers of introspection.

Anyone who says or pretends to know it is or isn’t a dead end doesn’t know what they are talking about and are acting on a belief akin to religion. No rationality involved.

It’s clearly not a stochastic parrot now that we know it introspects. That is now for sure. So the naysayers are wrong on that front. Utterly. Are they a dead end? That’s the last life line they’ll cling to for years as LLMs increase in capabilities everywhere. Whether it’s right or wrong they don’t actually know nor can they prove. I’m just curious why they even bother to state it or are so adamant about their beliefs.

replies(1): >>45781323 #

4. NateEag ◴[01 Nov 25 13:03 UTC] No.45781323{3}[source]▶

>>45781103 #

> Anyone who says or pretends to know it is or isn’t a dead end doesn’t know what they are talking about and are acting on a belief akin to religion.

> It’s clearly not a stochastic parrot now that we know it introspects. That is now for sure.

Your second claim here is kind of falling into that same religion-esque certitude.

From what I gathered, it seems like "introspection" as described in the paper may not be the same thing most humans mean when they describe our ability to introspect. They might be the same, but they might not.

I wouldn't even say the researchers have demonstrated that this "introspection" is definitely happening in the limited sense they've described.

They've given decent evidence, and it's shifted upwards my estimate that LLMs may be capable of something more than comprehensionless token prediction.

I don't think it's been shown "for sure."

replies(1): >>45782551 #

5. ninetyninenine ◴[01 Nov 25 15:42 UTC] No.45782551{4}[source]▶

>>45781323 #

> Your second claim here is kind of falling into that same religion-esque certitude.

Nope it’s not. We have logical causal test of introspection. By definition introspection is not stochastic parroting. If you disagree then it is a linguistic terminology issue in which you disagree on what the general definition of what a stochastic parrot is.

> From what I gathered, it seems like "introspection" as described in the paper may not be the same thing most humans mean when they describe our ability to introspect. They might be the same, but they might not.

Doesn’t need to be the same as what humans do. What it did show is self awareness of its own internal thought process and that breaks it out of the definition stochastic parrot. The criteria is not human level intelligence but introspection which is a much lower bar.

> They've given decent evidence, and it's shifted upwards my estimate that LLMs may be capable of something more than comprehensionless token prediction.

This is causal evidence and already beyond all statistical thresholds as they can trigger this at will. The evidence is beyond double blind medical experiments used to verify our entire medical industry. By logic this result is more reliable than modern medicine.

The result doesn’t say that LLMs can reliably introspect on demand but it does say with utmost reliability that LLMs can introspect and the evidence is extremely reproducible.

By logic your stance is already defeated.

replies(1): >>45783063 #

6. NateEag ◴[01 Nov 25 16:38 UTC] No.45783063{5}[source]▶

>>45782551 #

> This is causal evidence and already beyond all statistical thresholds as they can trigger this at will.

Their post says:

> Even using our best injection protocol, Claude Opus 4.1 only demonstrated this kind of awareness about 20% of the time.

That's not remotely close to "at will".

As I already said, this does incline me towards believing LLMs can be in some sense aware of their own mental state. It's certainly evidence.

Your certitude that it's what's happening, when the researchers' best efforts only yielded a twenty percent success rate, seems overconfident to me.

If they could in fact produce this at will, then my confidence would be much higher that they've shown LLMs can be self-aware.

...though we still wouldn't have a way to tell when they actually are aware of their internal state, because certainly sometimes they appear not to be.

replies(1): >>45784315 #

7. ninetyninenine ◴[01 Nov 25 18:59 UTC] No.45784315{6}[source]▶

>>45783063 #

>>Even using our best injection protocol, Claude Opus 4.1 only demonstrated this kind of awareness about 20% of the time. >That’s not remotely close to “at will”.

You are misunderstanding what “at will” means in this context. The researchers can cause the phenomenon through a specific class of prompts. The fact that it does not occur on every invocation does not mean it is random; it means the system is not deterministic in activation, not that the mechanism is absent. When you can deliberately trigger a result through controlled input, you have causation. If you can do so repeatedly with significant frequency, you have reliability. Those are the two pillars of causal inference. You are confusing reliability with constancy. No biological process operates with one hundred percent constancy either, yet we do not doubt their causal structure.

>Your certitude that it’s what’s happening, when the researchers’ best efforts only yielded a twenty percent success rate, seems overconfident to me.

That is not certitude without reason, it is certitude grounded in reproducibility. The bar for causal evidence in psychology, medicine, and even particle physics is nowhere near one hundred percent. The Higgs boson was announced at five sigma, roughly one in three and a half million odds of coincidence, not because it appeared every time, but because the pattern was statistically irrefutable. The same logic applies here. A stochastic parrot cannot self report internal reasoning chains contingent on its own cognitive state under a controlled injection protocol. Yet this was observed. The difference is categorical, not probabilistic.

>…though we still wouldn’t have a way to tell when they actually are aware of their internal state, because certainly sometimes they appear not to be.

That is a red herring. By that metric humans also fail the test of introspection since we are frequently unaware of our own biases, misattributions, and memory confabulations. Introspection has never meant omniscience of self; it means the presence of a self model that can be referenced internally. The data demonstrates precisely that: a model referring to its own hidden reasoning layer. That is introspection by every operational definition used in cognitive science.

The reason you think the conclusion sounds overconfident is because you are using “introspection” in a vague colloquial sense while the paper defines it operationally and tests it causally. Once you align definitions, the result follows deductively. What you are calling “caution” is really a refusal to update your priors when the evidence now directly contradicts the old narrative.