Ask HN: Any insider takes on Yann LeCun's push against current architectures?

385 points vessenes | 2 comments | 10 Mar 25 19:41 UTC | HN request time: 0s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

bravura ◴[14 Mar 25 22:40 UTC] No.43368085[source]▶

>>43325049 (OP) #

Okay I think I qualify. I'll bite.

LeCun's argument is this:

1) You can't learn an accurate world model just from text.

2) Multimodal learning (vision, language, etc) and interaction with the environment is crucial for true learning.

He and people like Hinton and Bengio have been saying for a while that there are tasks that mice can understand that an AI can't. And that even have mouse-level intelligence will be a breakthrough, but we cannot achieve that through language learning alone.

A simple example from "How Large Are Lions? Inducing Distributions over Quantitative Attributes" (https://arxiv.org/abs/1906.01327) is this: Learning the size of objects using pure text analysis requires significant gymnastics, while vision demonstrates physical size more easily. To determine the size of a lion you'll need to read thousands of sentences about lions, or you could look at two or three pictures.

LeCun isn't saying that LLMs aren't useful. He's just concerned with bigger problems, like AGI, which he believes cannot be solved purely through linguistic analysis.

The energy minimization architecture is more about joint multimodal learning.

(Energy minimization is a very old idea. LeCun has been on about it for a while and it's less controversial these days. Back when everyone tried to have a probabilistic interpretation of neural models, it was expensive to compute the normalization term / partition function. Energy minimization basically said: Set up a sensible loss and minimize it.)

replies(16): >>43368212 #>>43368251 #>>43368801 #>>43368817 #>>43369778 #>>43369887 #>>43370108 #>>43370284 #>>43371230 #>>43371304 #>>43371381 #>>43372224 #>>43372695 #>>43372927 #>>43373240 #>>43379739 #

throw310822 ◴[15 Mar 25 00:32 UTC] No.43368801[source]▶

>>43368085 #

I don't get it.

1) Yes it's true, learning from text is very hard. But LLMs are multimodal now.

2) That "size of a lion" paper is from 2019, which is a geological era from now. The SOTA was GPT2 which was barely able to spit out coherent text.

3) Have you tried asking a mouse to play chess or reason its way through some physics problem or to write some code? I'm really curious in which benchmark are mice surpassing chatgpt/ grok/ claude etc.

replies(2): >>43368852 #>>43377806 #

nextts ◴[15 Mar 25 00:40 UTC] No.43368852[source]▶

>>43368801 #

Mice can survive, forage, reproduce. Reproduce a mammal. There is a whole load of capability not available in an LLM.

An LLM is essentially a search over a compressed dataset with a tiny bit of reasoning as emergent behaviour. Because it is a parrot that is why you get "hallucinations". The search failed (like when you get a bad result in Google) or the lossy compression failed or it's reasoning failed.

Obviously there is a lot of stuff the LLM can find in its searches that are reminiscent of the great intelligence of the people writing for its training data.

The magic trick is impressive because when we judge a human what do we do... an exam? an interview? Someone with a perfect memory can fool many people because most people only acquire memory from tacit knowledge. Most people need to live in Paris to become fluent in French. So we see a robot that has a tiny bit of reasoning and a brilliant memory as a brilliant mind. But this is an illusion.

Here is an example:

User: what is the French Revolution?

Agent: The French Revolution was a period of political and societal change in France which began with the Estates General of 1789 and ended with the Coup of 18 Brumaire on 9 November 1799. Many of the revolution's ideas are considered fundamental principles of liberal democracy and its values remain central to modern French political discourse.

Can you spot the trick?

replies(2): >>43368909 #>>43375505 #

pfisch ◴[15 Mar 25 00:48 UTC] No.43368909[source]▶

>>43368852 #

When you talk to ~3 year old children they hallucinate quite a lot. Really almost nonstop when you ask them about almost anything.

I'm not convinced that what LLM's are doing is that far off the beaten path from our own cognition.

replies(2): >>43368957 #>>43368992 #

1. smelendez ◴[15 Mar 25 00:57 UTC] No.43368957{3}[source]▶

>>43368909 #

That’s interesting.

Lots of modern kids probably get exposed to way more fiction than fact thanks to TV.

I was an only child and watched a lot of cartoons and bad sitcoms as a kid, and I remember for a while my conversational style was way too full of puns, one-liners, and deliberately naive statements made for laughs.

replies(1): >>43369006 #

2. wegfawefgawefg ◴[15 Mar 25 01:08 UTC] No.43369006[source]▶

>>43368957 (TP) #

i wish more people were still like that

↑