Recent AI model progress feels mostly like bullshit

I've suggested (from my lowly layman vantage point) that an LLM has some idea of the fidelity of their response to a query — even if only in broad strokes like, "This answer is tracking with some high probability numbers" or "This answer has a very low correspondence with training data".

To that end the LLM could convey as much.

(Anecdotal, sorry:) I was using Claude (not paid) recently and noticed Claude hedging quite a bit when it had not before. Examples:

"Let me be careful about this response since we're discussing a very specific technical detail ..."

"Given how specific that technical detail is, I want to be transparent that while I aim to be accurate, I may hallucinate such precise historical specifications."

I confess my initial reaction was to ask ChatGPT since the answers are more self-assured, ha ha. So perhaps corporate AI are not likely to try and solve this problem of the LLM telling the user when it is on shaky ground. Bad for business.