Why language models hallucinate

(openai.com)

277 points simianwords | 2 comments | 06 Sep 25 07:41 UTC | HN request time: 0.001s | source

Show context

aleph_minus_one ◴[06 Sep 25 12:07 UTC] No.45148555[source]▶

> Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

To me, this seems to be an "US-American" way of thinking about multiple-choice tests. Other common ways to grade multiple-choice test that I have seen commonly are:

1. If the testee has the information that exactly one of N given choices is correct:

1.1 Give N-1 points for the correct answer, and -1 [negative one] point(s) for a wrong answer. This way, if the testee just answers the questions randomly, he will as expected value score 0 points.

1.2 A more brutal way if N>=3: the correct answer gives 1 point, all wrong answers give -1 points. You should learn your lesson only to give an answer if it is [alliteration unintended :-) ] correct (if N=2, the grading is identical to 1.1).

2. If there are possibly multiple correct answers, turn each item into choices of "yes" or "no" (with the option to give no answer). The correct choice gives you 1 point, the wrong gives you -1 point (i.e. as in 1.1).

replies(3): >>45148945 #>>45149423 #>>45163428 #

bananaflag ◴[06 Sep 25 14:13 UTC] No.45149423[source]▶

>>45148555 #

This is mentioned in the text:

> This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing.

replies(1): >>45149611 #

throwawaymaths ◴[06 Sep 25 14:32 UTC] No.45149611[source]▶

>>45149423 #

there's not really an easy way to train for that at scale. a "correct" answer may not be one token, there may be multiple synonymous answers starting with different tokens, you could add five space tokens in front of the answer amd it likely shouldn't make it "wrong".

replies(1): >>45149968 #

ACCount37 ◴[06 Sep 25 15:10 UTC] No.45149968[source]▶

>>45149611 #

Yes, it's not nearly as easy as "just fix the evals".

But better evals are still helpful, because they reward LLM vendors for trying to do the very-hard-to-do thing. Instead of rewarding them for training an LLM that's really good at emitting 7% confidence guesses.

replies(1): >>45151005 #

throwawaymaths ◴[06 Sep 25 17:08 UTC] No.45151005[source]▶

>>45149968 #

you're missing the point. SAT multiple choice negatives for random guesses, fine, you could trivially use this sort of a strategy for assigning cost functions to a classifier and backpropagate. how do you give negative weight to a wrong answer when training a transformer?

replies(2): >>45151264 #>>45152185 #

1. ACCount37 ◴[06 Sep 25 17:34 UTC] No.45151264[source]▶

>>45151005 #

In RLVR? Quite easily.

And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues.

Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5.

replies(1): >>45153042 #

2. ◴[06 Sep 25 21:30 UTC] No.45153042[source]▶

>>45151264 (TP) #

↑