LLMs hallucinate because they are language models. They are stochastic models of language. They model language, not truth.
If the “truthy” responses are common in their training set for a given prompt, you might be more likely to get something useful as output. Feels like we fell into that idea and said - ok this is useful as an information retrieval tool. And now we use RL to reinforce that useful behaviour. But still, it’s a (biased) language model.
I don’t think that’s how humans work. There’s more to it. We need a model of language, but it’s not sufficient to explain our mental mechanisms. We have other ways of thinking than generating language fragments.
Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.