Ask HN: Any insider takes on Yann LeCun's push against current architectures?

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

bravura ◴[14 Mar 25 22:40 UTC] No.43368085[source]▶

>>43325049 (OP) #

Okay I think I qualify. I'll bite.

LeCun's argument is this:

1) You can't learn an accurate world model just from text.

2) Multimodal learning (vision, language, etc) and interaction with the environment is crucial for true learning.

He and people like Hinton and Bengio have been saying for a while that there are tasks that mice can understand that an AI can't. And that even have mouse-level intelligence will be a breakthrough, but we cannot achieve that through language learning alone.

A simple example from "How Large Are Lions? Inducing Distributions over Quantitative Attributes" (https://arxiv.org/abs/1906.01327) is this: Learning the size of objects using pure text analysis requires significant gymnastics, while vision demonstrates physical size more easily. To determine the size of a lion you'll need to read thousands of sentences about lions, or you could look at two or three pictures.

LeCun isn't saying that LLMs aren't useful. He's just concerned with bigger problems, like AGI, which he believes cannot be solved purely through linguistic analysis.

The energy minimization architecture is more about joint multimodal learning.

(Energy minimization is a very old idea. LeCun has been on about it for a while and it's less controversial these days. Back when everyone tried to have a probabilistic interpretation of neural models, it was expensive to compute the normalization term / partition function. Energy minimization basically said: Set up a sensible loss and minimize it.)

replies(16): >>43368212 #>>43368251 #>>43368801 #>>43368817 #>>43369778 #>>43369887 #>>43370108 #>>43370284 #>>43371230 #>>43371304 #>>43371381 #>>43372224 #>>43372695 #>>43372927 #>>43373240 #>>43379739 #

codenlearn ◴[14 Mar 25 23:05 UTC] No.43368251[source]▶

>>43368085 #

Doesn't Language itself encode multimodal experiences? Let's take this case write when we write text, we have the skill and opportunity to encode the visual, tactile, and other sensory experiences into words. and the fact is llm's trained on massive text corpora are indirectly learning from human multimodal experiences translated into language. This might be less direct than firsthand sensory experience, but potentially more efficient by leveraging human-curated information. Text can describe simulations of physical environments. Models might learn physical dynamics through textual descriptions of physics, video game logs, scientific papers, etc. A sufficiently comprehensive text corpus might contain enough information to develop reasonable physical intuition without direct sensory experience.

As I'm typing this there is one reality that I'm understanding, the quality and completeness of the data fundamentally determines how well an AI system will work. and with just text this is hard to achieve and a multi modal experience is a must.

thank you for explaining in very simple terms where I could understand

replies(7): >>43368477 #>>43368489 #>>43368509 #>>43368574 #>>43368699 #>>43370974 #>>43373409 #

ThinkBeat ◴[14 Mar 25 23:55 UTC] No.43368574[source]▶

>>43368251 #

No.

> The sun feels hot on your skin.

No matter how many times you read that, you cannot understand what the experience is like.

> You can read a book about Yoga and read about the Tittibhasana pose

But by just reading you will not understand what it feels like. And unless you are in great shape and with greate balance you will fail for a while before you get it right. (which is only human).

I have read what shooting up with heroin feels like. From a few different sources. I certain that I will have no real idea unless I try it. (and I dont want to do that).

Waterboarding. I have read about it. I have seen it on tv. I am certain that is all abstract to having someone do it to you.

Hand eye cordination, balance, color, taste, pain, and so on, How we encode things is from all senses, state of mind, experiences up until that time.

We also forget and change what we remember.

Many songs takes me back to a certain time, a certain place, a certain feeling Taste is the same. Location.

The way we learn and the way we remember things is incredebily more complex than text.

But if you have shared excperiences, then when you write about it, other people will know. Most people felt the sun hot on their skin.

To different extents this is also true for animals. Now I dont think most mice can read, but they do learn with many different senses, and remeber some combination or permutation.

replies(6): >>43369173 #>>43369490 #>>43370066 #>>43370431 #>>43373489 #>>43440558 #

csomar ◴[15 Mar 25 04:48 UTC] No.43370066[source]▶

>>43368574 #

All of these "experiences" are encoded in your brain as electricity. So "text" can encode them, though English words might not be the proper way to do it.

replies(3): >>43370354 #>>43370552 #>>43370994 #

chongli ◴[15 Mar 25 06:04 UTC] No.43370354[source]▶

>>43370066 #

No, text can only refer to them. There is not a text on this planet that encodes what the heat of the sun feels like on your skin. A person who had never been outdoors could never experience that sensation by reading text.

replies(2): >>43370498 #>>43370903 #

tgma ◴[15 Mar 25 08:10 UTC] No.43370903[source]▶

>>43370354 #

> There is not a text on this planet that encodes what the heat of the sun feels like on your skin.

> A person who had never been outdoors could never experience that sensation by reading text.

I don't think the latter implies the former as obviously as you make it to be. Unless you believe in some sort of metaphysical description of human, you can certainly encode the feeling (as mentioned in another comment it will be reduced to electrical signals after all). The only question is how much storage you need for that encoding to get what precision. However, the latter statement, if true, is simply constrained by your input device to the brain, i.e. you cannot transfer your encoding to the hardware in this case a human brain via reading or listening. There could be higher bandwidth interfaces like neuralink that may do that to human brain and in the case of AI, an auxiliary device might not be needed and the encoding would be directly mmap'd.

replies(1): >>43371116 #

chongli ◴[15 Mar 25 08:56 UTC] No.43371116[source]▶

>>43370903 #

Electrical signals are not the same as subjective experiences. While a machine may be able to record and play back these signals for humans to experience, that does not imply that the experiences themselves are recorded nor that the machine has any access to them.

A deaf person can use a tape recorder to record and play back a symphony but that does not encode the experience in any way the deaf person could share.

replies(1): >>43373811 #

mietek ◴[15 Mar 25 17:12 UTC] No.43373811[source]▶

>>43371116 #

That’s some strong claims, given that philosophers (e.g. Chalmers vs Dennett) can’t even agree whether subjective experiences exist or not.

replies(1): >>43374410 #

chongli ◴[15 Mar 25 18:46 UTC] No.43374410{3}[source]▶

>>43373811 #

Even if you’re a pure Dennettian functionalist you still commit to a functional difference between signals in transit (or at rest) and signals being processed and interpreted. Holding a cassette tape with a recording of a symphony is not the same as hearing the symphony.

Applying this case to AI gives rise to the Chinese Room argument. LLMs’ propensity for hallucinations invite this comparison.

replies(1): >>43374917 #

mietek ◴[15 Mar 25 20:21 UTC] No.43374917{4}[source]▶

>>43374410 #

Are LLMs having subjective experiences? Surely not. But if you claim that human subjective experiences are not the result of electrical signals in the brain, then what exactly is your position? Dualism?

Personally, I think the Chinese room argument is invalid. In order for the person in the room to respond to any possible query by looking up the query in a book, the book would need to be infinite and therefore impossible as a physical object. Otherwise, if the book is supposed to describe an algorithm for the person to follow in order to compute a response, then that algorithm is the intelligent entity that is capable of understanding, and the person in the room is merely the computational substrate.

replies(1): >>43375616 #

1. chongli ◴[15 Mar 25 22:40 UTC] No.43375616{5}[source]▶

>>43374917 #

The Chinese Room is a perfect analogy for what's going on with LLMs. The book is not infinite, it's flawed. And that's the point: we keep bumping into the rough edges of LLMs with their hallucinations and faulty reasoning because the book can never be complete. Thus we keep getting responses that make us realize the LLM is not intelligent and has no idea what it's saying.

The only part where the book analogy falls down has to do with the technical implementation of LLMs, with their tokenization and their vast sets of weights. But that is merely an encoding for the training data. Books can be encoded similarly by using traditional compression algorithms (like LZMA).

replies(1): >>43379081 #

2. og_kalu ◴[16 Mar 25 13:50 UTC] No.43379081[source]▶

>>43375616 (TP) #

>The book is not infinite, it's flawed.

Oh and the human book is surely infinite and unflawed right ?

>we keep bumping into the rough edges of LLMs with their hallucinations and faulty reasoning

Both things humans also do in excess

The Chinese Room is nonsensical. Can you point to any part of your brain that understands English ? I guess you are a Chinese Room then.

replies(1): >>43380249 #

3. chongli ◴[16 Mar 25 16:32 UTC] No.43380249[source]▶

>>43379081 #

Humans have the ability to admit when they do not know something. We say “sorry, I don’t know, let me get back to you.” LLMs cannot do this. They either have the right answer in the book or they make up nonsense (hallucinate). And they do not even know which one they’re doing!

replies(1): >>43380799 #

4. og_kalu ◴[16 Mar 25 17:50 UTC] No.43380799{3}[source]▶

>>43380249 #

>Humans have the ability to admit when they do not know something.

No not really. It's not even rare that a human confidently says and believes something and really has no idea what he/she's talking about.

>We say “sorry, I don’t know, let me get back to you.” LLMs cannot do this

Yeah they can. And they can do it much better than chance. They just don't do it as well as humans.

>And they do not even know which one they’re doing!

There's plenty of research that suggests this is the case.

https://news.ycombinator.com/item?id=41418486

replies(1): >>43381871 #

5. chongli ◴[16 Mar 25 20:14 UTC] No.43381871{4}[source]▶

>>43380799 #

No not really. It's not even rare that a human confidently says and believes something and really has no idea what he/she's talking about.

Like you’re doing right now? People say “I don’t know” all the time. Especially children. That people also exaggerate, bluff, and outright lie is not proof that people don’t have this ability.

When people are put in situations where they will be shamed or suffer other social stigmas for admitting ignorance then we can expect them to be less than candid.

As for your links to research showing that LLMs do possess the ability of introspection, I have one question: why have we not seen this in consumer-facing tools? Are the LLMs afraid of social stigma?

replies(1): >>43383387 #

6. og_kalu ◴[16 Mar 25 22:58 UTC] No.43383387{5}[source]▶

>>43381871 #

>Like you’re doing right now?

Lol Okay

>When people are put in situations where they will be shamed or suffer other social stigmas for admitting ignorance then we can expect them to be less than candid.

Good thing I wasn't talking about that. There's a lot of evidence that human explanations are regularly post-hoc rationalizations they fully believe in. They're not lieing to anyone, they just fully believe the nonsense their brain has concocted.

Experiments on choice and preferences https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3196841/

Split Brain Experiments https://www.nature.com/articles/483260a

>As for your links to research showing that LLMs do possess the ability of introspection, I have one question: why have we not seen this in consumer-facing tools? Are the LLMs afraid of social stigma?

Maybe read any of them ? If you weren't interested in evidence to the contrary of your points then you could have just said so and I wouldn't have wasted my time. The 1st and 6th Links make it quite clear current post-training processes hurt calibration a lot.

↑