←back to thread

Why language models hallucinate

(openai.com)

277 points simianwords | 1 comments | 06 Sep 25 07:41 UTC | HN request time: 0s | source

Show context

kingstnap ◴[06 Sep 25 15:51 UTC] No.45150336[source]▶

>>45147385 (OP) #

There is this deeply wrong part of this paper that no one has mentioned:

The model head doesn't hallucinate. The sampler does.

If you ask an LLM when x was born and it doesn't know.

And you take a look at the actual model outputs which is a probability distribution over tokens.

IDK is cleanly represented as a uniform probability Jan 1 to Dec 31

If you ask it to answer a multiple choice question and it doesn't know. It will say this:

25% A, 25% B, 25% C, 25%D.

Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.

In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.

replies(4): >>45150445 #>>45150464 #>>45154049 #>>45167516 #

1. ACCount37 ◴[06 Sep 25 16:03 UTC] No.45150445[source]▶

No, that's a misconception. It's not nearly that simple.

There are questions that have a palpable split in probability between the answers, with logit distribution immediately exposing the underlying lack-of-confidence.

But there are also questions that cause an LLM to produce consistent-but-wrong answers. For example, because the question was associated with another not-the-same-but-somewhat-similar question internally, and that was enough to give an LLM a 93% on B, despite B being the wrong answer.

An LLM might even have some latent awareness of its own uncertainty in this case. But it has, for some reason, decided to proceed with a "best guess" answer, which was in this case wrong.