Ask HN: Any insider takes on Yann LeCun's push against current architectures?

385 points vessenes | 1 comments | 10 Mar 25 19:41 UTC | HN request time: 0.208s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

bravura ◴[14 Mar 25 22:40 UTC] No.43368085[source]▶

>>43325049 (OP) #

Okay I think I qualify. I'll bite.

LeCun's argument is this:

1) You can't learn an accurate world model just from text.

2) Multimodal learning (vision, language, etc) and interaction with the environment is crucial for true learning.

He and people like Hinton and Bengio have been saying for a while that there are tasks that mice can understand that an AI can't. And that even have mouse-level intelligence will be a breakthrough, but we cannot achieve that through language learning alone.

A simple example from "How Large Are Lions? Inducing Distributions over Quantitative Attributes" (https://arxiv.org/abs/1906.01327) is this: Learning the size of objects using pure text analysis requires significant gymnastics, while vision demonstrates physical size more easily. To determine the size of a lion you'll need to read thousands of sentences about lions, or you could look at two or three pictures.

LeCun isn't saying that LLMs aren't useful. He's just concerned with bigger problems, like AGI, which he believes cannot be solved purely through linguistic analysis.

The energy minimization architecture is more about joint multimodal learning.

(Energy minimization is a very old idea. LeCun has been on about it for a while and it's less controversial these days. Back when everyone tried to have a probabilistic interpretation of neural models, it was expensive to compute the normalization term / partition function. Energy minimization basically said: Set up a sensible loss and minimize it.)

replies(16): >>43368212 #>>43368251 #>>43368801 #>>43368817 #>>43369778 #>>43369887 #>>43370108 #>>43370284 #>>43371230 #>>43371304 #>>43371381 #>>43372224 #>>43372695 #>>43372927 #>>43373240 #>>43379739 #

somenameforme ◴[15 Mar 25 09:37 UTC] No.43371304[source]▶

>>43368085 #

Is that what he's arguing? My perspective on what he's arguing is that LLMs effectively rely on a probabilistic approach to the next token based on the previous. When they're wrong, which the technology all but ensures will happen with some significant degree of frequency, you get cascading errors. It's like in science where we all build upon the shoulders of giants, but if it turns out that one of those shoulders was simply wrong, somehow, then everything built on top of it would be increasingly absurd. E.g. - how the assumption of a geocentric universe inevitably leads to epicycles which leads to ever more elaborate, and plainly wrong, 'outputs.'

Without any 'understanding' or knowledge of what they're saying, they will remain irreconcilably dysfunctional. Hence the typical pattern with LLMs:

---

How do I do [x]?

You do [a].

No that's wrong because reasons.

Oh I'm sorry. You're completely right. Thanks for correcting me. I'll keep that in mind. You do [b].

No that's also wrong because reasons.

Oh I'm sorry. You're completely right. Thanks for correcting me. I'll keep that in mind. You do [a].

FML

---

More advanced systems might add a c or a d, but it's just more noise before repeating the same pattern. Deep Seek's more visible (and lengthy) reasoning demonstrates this perhaps the most clearly. It just can't stop coming back to the same wrong (but statistically probable) answer and so ping-ponging off that (which it at least acknowledges is wrong due to user input) makes up basically the entirety of its reasoning phase.

replies(2): >>43371489 #>>43401323 #

1. gsf_emergency_2 ◴[15 Mar 25 10:22 UTC] No.43371489[source]▶

>>43371304 #

on "stochastic parrots"

Table stakes for sentience: knowing when the best answer is not good enough.. try prompting LLMs with that..

It's related to LeCun's (and Ravid's) subtle question I mentioned in passing below:

To Compress Or Not To Compress?

(For even a vast majority of Humans, except tacitly, that is not a question!)

↑