←back to thread

385 points vessenes | 2 comments | | HN request time: 0s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context
chriskanan ◴[] No.43372073[source]
A lot of the responses seem to be answering a different question: "Why does LeCun think LLMs won't lead to AGI?" I could answer that, but the question you are asking is "Why does LeCun think hallucinations are inherent in LLMs?"

To answer your question, think about how we train LLMs: We have them learn the statistical distribution of all written human language, such that given a chunk of text (a prompt, etc.) it then samples its output distribution to produces the next most likely token (word, sub-word, etc.) that should be produced and keeps doing that. It never learns how to judge what is true or false and during training it never needs to learn "Do I already know this?" It is just spoon fed information that it has to memorize and has no ability to acquire metacognition, which is something that it would need to be trained to attain. As humans, we know what we don't know (to an extent) and can identify when we already know something or don't already know something, such that we can say "I don't know." During training, an LLM is never taught to do this sort of introspection, so it never will know what it doesn't know.

I have a bunch of ideas about how to address this with a new architecture and a lifelong learning training paradigm, but it has been hard to execute. I'm an AI professor, but really pushing the envelope in that direction requires I think a small team (10-20) of strong AI scientists and engineers working collaboratively and significant computational resources. It just can't be done efficiently in academia where we have PhD student trainees who all need to be first author and work largely in isolation. By the time AI PhD students get good, they graduate.

I've been trying to find the time to focus on getting a start-up going focused on this. With Terry Sejnowski, I pitched my ideas to a group affiliated with Schmidt Sciences that funds science non-profits at around $20M per year for 5 years. They claimed to love my ideas, but didn't go for it....

replies(1): >>43374223 #
1. emrah ◴[] No.43374223[source]
Would you care to post your ideas somewhere online so others can read, critique, try etc?
replies(1): >>43375264 #
2. random3 ◴[] No.43375264[source]
"we love your ideas" == no

"when do you close the round?" = maybe

money in the bank account = yes