←back to thread

385 points vessenes | 2 comments | | HN request time: 0s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context
bravura ◴[] No.43368085[source]
Okay I think I qualify. I'll bite.

LeCun's argument is this:

1) You can't learn an accurate world model just from text.

2) Multimodal learning (vision, language, etc) and interaction with the environment is crucial for true learning.

He and people like Hinton and Bengio have been saying for a while that there are tasks that mice can understand that an AI can't. And that even have mouse-level intelligence will be a breakthrough, but we cannot achieve that through language learning alone.

A simple example from "How Large Are Lions? Inducing Distributions over Quantitative Attributes" (https://arxiv.org/abs/1906.01327) is this: Learning the size of objects using pure text analysis requires significant gymnastics, while vision demonstrates physical size more easily. To determine the size of a lion you'll need to read thousands of sentences about lions, or you could look at two or three pictures.

LeCun isn't saying that LLMs aren't useful. He's just concerned with bigger problems, like AGI, which he believes cannot be solved purely through linguistic analysis.

The energy minimization architecture is more about joint multimodal learning.

(Energy minimization is a very old idea. LeCun has been on about it for a while and it's less controversial these days. Back when everyone tried to have a probabilistic interpretation of neural models, it was expensive to compute the normalization term / partition function. Energy minimization basically said: Set up a sensible loss and minimize it.)

replies(16): >>43368212 #>>43368251 #>>43368801 #>>43368817 #>>43369778 #>>43369887 #>>43370108 #>>43370284 #>>43371230 #>>43371304 #>>43371381 #>>43372224 #>>43372695 #>>43372927 #>>43373240 #>>43379739 #
iainctduncan ◴[] No.43369887[source]
Thanks for articulating this so well. I'm a musician and music/CS phd student, and as a jazz improvisor of advanced skill (30+ years), I'm accutely aware that there are significant areas of intelligence for which linguistic thinking is not only not good enough, but something to be avoided as much as one can (which is bloody hard sometimes). I have found it so frustrating, but hard to figure out how to counter, that the current LLM zeitgeist seems to hinge on a belief that linguistic intelligence is both necessary and sufficient for AGI.
replies(1): >>43369957 #
kadushka ◴[] No.43369957[source]
Most modern LLMs are multimodal.
replies(2): >>43372453 #>>43377793 #
yahoozoo ◴[] No.43372453[source]
Does it really matter? At the end of the day, all the modalities and their architectures boil down to matrices of numbers and statistical probability. There’s no agency, no soul.
replies(1): >>43373061 #
kadushka ◴[] No.43373061[source]
At the end of the day, all modalities boil down to patterns of electrical activity in your brain.
replies(1): >>43373651 #
yahoozoo ◴[] No.43373651{3}[source]
The brain is the important part. The electricity just keeps it going. And it’s more than numerical matrices.
replies(1): >>43374485 #
1. kadushka ◴[] No.43374485{4}[source]
You mean soul?
replies(1): >>43376054 #
2. namaria ◴[] No.43376054[source]
You misspelled strawman