There is this deeply wrong part of this paper that no one has mentioned:
The model head doesn't hallucinate. The sampler does.
If you ask an LLM when x was born and it doesn't know.
And you take a look at the actual model outputs which is a probability distribution over tokens.
IDK is cleanly represented as a uniform probability Jan 1 to Dec 31
If you ask it to answer a multiple choice question and it doesn't know. It will say this:
25% A, 25% B, 25% C, 25%D.
Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.
In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.
replies(4):