1. I don't think the comment above gets to the "root" of the problem, which is "the LLM appears overconfident". Thankfully, that problem is relatively easy to address by trying different LLMs and different pre-prompts. Like I said, your results might vary.
2. While the question of "is the AI thinking" is interesting, I think it is a malformed question. Think about it: how do you make progress on that question, as stated? My take: it is unanswerable without considerable reframing. It helps to reframe toward something measurable. Here, I would return to the original question: to what degree does an LLM output calibrated claims? How often does it make overconfident claims? Underconfident claims?
3. Pretending requires at least metacognition, if not consciousness. Agree? It is a fascinating question to explore how much metacognition a particular LLM demonstrates.
In my view, this is still a research question, both in terms of understanding how LLM architectures work as well as designing good evals to test for metacognition.
In my experience, when using chain-of-thought, LLMs can be quite good at recognizing previous flaws, including overconfidence, meaning that if one is careful, the LLM behaves as if it has a decent level of metacognition. But to see this, the driver (the human) must demonstrate discipline. I'm skeptical that most people prompt LLMs rigorously and carefully.
4. It helps discuss this carefully. Word choice matters a lot with AI discussions, much more than a even a relatively capable software developer / hacker is comfortable with. Casual phrasings are likely to lead us astray. I'll make a stronger claim: a large fraction of successful tech people haven't yet developed clear language and thinking about discussing classic machine learning, much less AI as a field or LLMs in particular. But many of these people lack the awareness or mindset to remedy this; they fall into the usual overconfidence or lack-of-curiosity traps.
5. You wrote: "LLMs are performing well when they are putting out what you want to hear."
I disagree; instead, I claim people, upon reflection, would prefer an LLM be helpful, useful, and true. This often means correcting mistakes or challenging assumptions. Of course people have short-term failure modes, such is human nature. But when you look at most LLM eval frameworks, you'll see that truth and safety matter are primary factors. Yes-manning or sycophancy is still a problem.
6. Many of us have seen the "LLMs just parrot language" claim repeated many times. After having read many papers on LLMs, I wouldn't use the words "LLMs just parrot language". Why? That phrase is more likely to confuse discussion than advance it.
I recommend this to everyone: instead of using that phrase, challenge yourself to articulate at least two POVs relating to the "LLMs are stochastic parrots" argument.
Discuss with a curious friend or someone you respect. If it is just someone online you don't know, you might simply dismiss them out of hand.
The "stochastic parrot" phrase is fun and is a catchy title for an AI researcher who wants to get their paper noticed. But isn't a great phrase for driving mutual understanding, particularly not on a forum like HN where our LLM foundations vary widely.
Having said all this, if you want to engage on the topic at the object level, there are better fora than HN for it. I suggest starting with a literature review and finding an ML or AI-specific forum.
7. There is a lot of confusion and polarization around AI. We are capable of discussing better, but (a) we have to want to; (b) we have to learn now; and (c) we have to make time to do it.
Like I wrote in #6, above, be mindful of where you are discussing and the level of understanding of people around. I've found HN to be middling on this, but I like to pop in from time to time to see how we're doing. The overconfidence and egos are strong here, arguably stronger than the culture and norms that should help us strive for true understanding.
8. These are my views only. I'm not "on one side", because I reject the false dichotomy that AI-related polarization might suggest.