←back to thread

385 points vessenes | 1 comments | | HN request time: 0.001s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context
ActorNightly ◴[] No.43325670[source]
Not an official ML researcher, but I do happen to understand this stuff.

The problem with LLMs is that the output is inherently stochastic - i.e there isn't a "I don't have enough information" option. This is due to the fact that LLMs are basically just giant look up maps with interpolation.

Energy minimization is more of an abstract approach to where you can use architectures that don't rely on things like differentiability. True AI won't be solely feedforward architectures like current LLMs. To give an answer, they will basically determine alogrithm on the fly that includes computation and search. To learn that algorithm (or algorithm parameters), at training time, you need something that doesn't rely on continuous values, but still converges to the right answer. So instead you assign a fitness score, like memory use or compute cycles, and differentiate based on that. This is basically how search works with genetic algorithms or PSO.

replies(10): >>43365410 #>>43366234 #>>43366675 #>>43366830 #>>43366868 #>>43366901 #>>43366902 #>>43366953 #>>43368585 #>>43368625 #
seanhunter ◴[] No.43365410[source]
> The problem with LLMs is that the output is inherently stochastic - i.e there isn't a "I don't have enough information" option. This is due to the fact that LLMs are basically just giant look up maps with interpolation.

I don't think this explanation is correct. The input to the decoder at the end of all the attention heads etc (as I understand it) is a probability distribution over tokens. So the model as a whole does have an ability to score low confidence in something by assigning it a low probability.

The problem is that thing is a token (part of a word). So the LLM can say "I don't have enough information" to decide on the next part of a word but has no ability to say "I don't know what on earth I'm talking about" (in general - not associated with a particular token).

replies(5): >>43365608 #>>43365655 #>>43365953 #>>43366351 #>>43366485 #
duskwuff ◴[] No.43365655[source]
Right. And, as a result, low token-level confidence can end up indicating "there are other ways this could have been worded" or "there are other topics which could have been mentioned here" just as often as it does "this output is factually incorrect". Possibly even more often, in fact.
replies(1): >>43365813 #
1. vessenes ◴[] No.43365813[source]
My first reaction is that a model can’t, but a sampling architecture probably could. I’m trying to understand if what we have as a whole architecture for most inference now is responsive to the critique or not.