←back to thread

385 points vessenes | 1 comments | | HN request time: 0s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context
probably_wrong ◴[] No.43365762[source]
I haven't read Yann Lecun's take. Based on your description alone my first impression would be: there's a paper [1] arguing that "beam search enforces uniform information density in text, a property motivated by cognitive science". UID claims, in short, that a speaker only delivers as much content as they think the listener can take (no more, no less) and the paper claims that beam search enforced this property at generation time.

The paper would be a strong argument against your point: if neural architectures are already constraining the amount of information that a text generation system delivers the same way a human (allegedly) does, then I don't see which "energy" measure one could take that could perform any better.

Then again, perhaps they have one in mind and I just haven't read it.

[1] https://aclanthology.org/2020.emnlp-main.170/

replies(1): >>43365789 #
vessenes ◴[] No.43365789[source]
I believe he’s talking about some sort of ‘energy as measured by distance from the models understanding of the world’ as in quite literally a world model. But again I’m ignorant, hence the post!
replies(3): >>43365847 #>>43366229 #>>43373547 #
1. tyronehed ◴[] No.43365847[source]
When an architecture is based around world model building, then it is a casual outcome that similar concepts and things end up being stored in similar places. They overlap. As soon as your solution starts to get mathematically complex, you are departing from what the human brain does. Not saying that in some universe it might be possible to make a statistical intelligence, but when you go that direction you are straying away from the only existing intelligences that we know about. The human brain. So the best solutions will closely echo neuroscience.