Most active commenters

nextts(3)

Popular/hot comments

>>43369634 #

Ask HN: Any insider takes on Yann LeCun's push against current architectures?

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

bravura ◴[14 Mar 25 22:40 UTC] No.43368085[source]▶

>>43325049 (OP) #

Okay I think I qualify. I'll bite.

LeCun's argument is this:

1) You can't learn an accurate world model just from text.

2) Multimodal learning (vision, language, etc) and interaction with the environment is crucial for true learning.

He and people like Hinton and Bengio have been saying for a while that there are tasks that mice can understand that an AI can't. And that even have mouse-level intelligence will be a breakthrough, but we cannot achieve that through language learning alone.

A simple example from "How Large Are Lions? Inducing Distributions over Quantitative Attributes" (https://arxiv.org/abs/1906.01327) is this: Learning the size of objects using pure text analysis requires significant gymnastics, while vision demonstrates physical size more easily. To determine the size of a lion you'll need to read thousands of sentences about lions, or you could look at two or three pictures.

LeCun isn't saying that LLMs aren't useful. He's just concerned with bigger problems, like AGI, which he believes cannot be solved purely through linguistic analysis.

The energy minimization architecture is more about joint multimodal learning.

(Energy minimization is a very old idea. LeCun has been on about it for a while and it's less controversial these days. Back when everyone tried to have a probabilistic interpretation of neural models, it was expensive to compute the normalization term / partition function. Energy minimization basically said: Set up a sensible loss and minimize it.)

replies(16): >>43368212 #>>43368251 #>>43368801 #>>43368817 #>>43369778 #>>43369887 #>>43370108 #>>43370284 #>>43371230 #>>43371304 #>>43371381 #>>43372224 #>>43372695 #>>43372927 #>>43373240 #>>43379739 #

throw310822 ◴[15 Mar 25 00:32 UTC] No.43368801[source]▶

>>43368085 #

I don't get it.

1) Yes it's true, learning from text is very hard. But LLMs are multimodal now.

2) That "size of a lion" paper is from 2019, which is a geological era from now. The SOTA was GPT2 which was barely able to spit out coherent text.

3) Have you tried asking a mouse to play chess or reason its way through some physics problem or to write some code? I'm really curious in which benchmark are mice surpassing chatgpt/ grok/ claude etc.

replies(2): >>43368852 #>>43377806 #

1. nextts ◴[15 Mar 25 00:40 UTC] No.43368852[source]▶

>>43368801 #

Mice can survive, forage, reproduce. Reproduce a mammal. There is a whole load of capability not available in an LLM.

An LLM is essentially a search over a compressed dataset with a tiny bit of reasoning as emergent behaviour. Because it is a parrot that is why you get "hallucinations". The search failed (like when you get a bad result in Google) or the lossy compression failed or it's reasoning failed.

Obviously there is a lot of stuff the LLM can find in its searches that are reminiscent of the great intelligence of the people writing for its training data.

The magic trick is impressive because when we judge a human what do we do... an exam? an interview? Someone with a perfect memory can fool many people because most people only acquire memory from tacit knowledge. Most people need to live in Paris to become fluent in French. So we see a robot that has a tiny bit of reasoning and a brilliant memory as a brilliant mind. But this is an illusion.

Here is an example:

User: what is the French Revolution?

Agent: The French Revolution was a period of political and societal change in France which began with the Estates General of 1789 and ended with the Coup of 18 Brumaire on 9 November 1799. Many of the revolution's ideas are considered fundamental principles of liberal democracy and its values remain central to modern French political discourse.

Can you spot the trick?

replies(2): >>43368909 #>>43375505 #

2. pfisch ◴[15 Mar 25 00:48 UTC] No.43368909[source]▶

>>43368852 (TP) #

When you talk to ~3 year old children they hallucinate quite a lot. Really almost nonstop when you ask them about almost anything.

I'm not convinced that what LLM's are doing is that far off the beaten path from our own cognition.

replies(2): >>43368957 #>>43368992 #

3. smelendez ◴[15 Mar 25 00:57 UTC] No.43368957[source]▶

>>43368909 #

That’s interesting.

Lots of modern kids probably get exposed to way more fiction than fact thanks to TV.

I was an only child and watched a lot of cartoons and bad sitcoms as a kid, and I remember for a while my conversational style was way too full of puns, one-liners, and deliberately naive statements made for laughs.

replies(1): >>43369006 #

4. nextts ◴[15 Mar 25 01:04 UTC] No.43368992[source]▶

>>43368909 #

Interesting but a bit non-sequitur.

Humans learn and get things wrong. A formative mind is a seperate subject. But a 3 year old is vastly intelligent vs an LLM. Comparing the sounds from a 3 year old and the binary tokens from an LLM is simply indulging the illusion.

I am also not convinced that magicians saw people in half, and thise people survive, defying medical and physical science.

replies(1): >>43369011 #

5. wegfawefgawefg ◴[15 Mar 25 01:08 UTC] No.43369006{3}[source]▶

>>43368957 #

i wish more people were still like that

6. refulgentis ◴[15 Mar 25 01:09 UTC] No.43369011{3}[source]▶

>>43368992 #

I'm not sure I buy that, I didnt find the counter argument persuasive, but this comment basically took you from thoughtful to smug — unfairly so, ironically, because I've been so bored by not understanding Yann's "average housecat is smarter than an LLM"

Speaking of which...I'm glad you're here ,because I have an interlocutor I can be honest with while getting at the root question of the Ask HN.

What in the world does it mean that a 3 year old is smarter than an LLM?

I don't understand the thing about sounds vs. binary either. Like, both go completely over my head.

The only thing I can think of it's some implied intelligence scoring index where "writing a resume" and "writing creative fiction" and "writing code" are in the same bucket thats limited to 10 points. Then there's anther 10 point bucket for "can vocalize", that an LLM is going to get 0 on.*

If that's the case, it comes across as intentionally obtuse, in that there's an implied prior about how intelligence is scored and it's a somewhat unique interpretation that seems more motivated by the question than reflective of reality — i.e. assume a blind mute human who types out answers out that match our LLMs. Would we say that person is not as intelligent as a 3 year old?

* well, it shouldn't, but for now let's bypass that quagmire

replies(2): >>43369171 #>>43369634 #

7. nextts ◴[15 Mar 25 01:45 UTC] No.43369171{4}[source]▶

>>43369011 #

It is easy to cross wires in a HN thread.

I think what makes this discussion hard (hell it would be a hard PhD topic!) is:

What do we mean by smart? Intelligent? Etc.

What is my agenda and what is yours? What are we really asking?

I won't make any more arguments but pose these questions. Not for you to answer but everyone to think about:

Given (assuming) mammals including us have evolved and developed thought and language as a survival advantage, and LLMs use language because they have been trained on text produced by humans (as well as RLHF) - how do we tell on the scale of "Search engine for human output" to "Conscious Intelligent Thinking Being" where the LLM fits?

When a human says I love you, do they mean it, or is it merely 3 tokens? If an LLM says it, does it mean it?

I think the 3yr old thing is a red herring because adult intelligence VS AI is hard enough to compare (and we are the adults!) let alone bring children brain development into it. LLMs do not self organise their hardware. I'd say forget about 3 year olds for now. Talk about adults brainfarts instead. They happen!

replies(1): >>43392556 #

8. chmod775 ◴[15 Mar 25 03:12 UTC] No.43369634{4}[source]▶

>>43369011 #

> What in the world does it mean that a 3 year old is smarter than an LLM?

Because LLMs have terrible comprehension of the real world. Here's an example:

> You: If you put a toddler next to a wig on the floor, which reaches higher?

> ChatGPT: The wig would typically reach higher than the toddler, especially if the wig is a standard size or has long hair. Toddlers are generally around 2 to 3 feet tall, while wigs can range in size, but many wigs are designed to be worn by adults and have hair that extends well beyond a toddler's height. So, unless the wig is very small or short, it would likely reach higher.

Both the toddler and the wig would be lying down (unless the toddler can stand already), with the wig likely collapsing into a very flat heap, making the toddler "reach higher". Since it lacks comprehension of how these things behave in the real world, ChatGPT goes on about length of the wig and the average height of toddlers, which are both qualities that are not even relevant.

If you could communicate this problem to a toddler, they would probably intuit it even with their still limited understanding of the world.

Here's another fun prompt to mess with LLMs:

> You: I'm trying to make a rope, made of only aubergines, not using any twine or string. What kind of knot would you recommend to tie together each aubergine with the next? I'm trying to use up as little of the length of each aubergine as possible.

It's obviously impossible nonsense if you ask any human, but ChatGPT will happily recommend you some techniques for tying aubergines in a knot.

And if you really want a laugh follow up with:

> You: Are they the worst fruit for this?

> ChatGPT: Aubergines (or eggplants) aren't ideal for making a rope, but they might not be the absolute worst, either. [...] Potentially Better Fruits: Apples or Pears: These fruits are firmer and have a smoother, more uniform shape, making them better suited for knotting.

Thanks to the large corpus of text ChatGPT is trained on, it's really good at hiding that it has basically no comprehension of the real world, however if you go unconventional places the illusion quickly breaks.

Bonus round:

> You: You're clueless about fruit!

> ChatGPT: Haha, fair enough! Looks like I missed the mark there. You’re probably right, though—there are definitely better fruits for this kind of thing than I suggested. Let’s try again. If you're looking to make a rope from a fruit specifically (and not a vegetable or tuber), something like coconuts or durian would be way better options, since they’re harder, more rigid, and can handle some tying pressure without collapsing. And apples and pears would definitely hold up well due to their firmer texture.

Coconuts!

replies(3): >>43369699 #>>43369935 #>>43380234 #

9. refulgentis ◴[15 Mar 25 03:21 UTC] No.43369699{5}[source]▶

>>43369634 #

> Because LLMs have terrible comprehension of the real world.

That doesn't seem to be the case.

> You: If you put a toddler next to a wig on the floor, which reaches higher? > ChatGPT: ...

I answered it wrong too.

I had to read it, and your reaction to the implied obvious reasoning 3 times, to figure out the implied obvious reasoning, and understand your intent was the toddler was standing and the wig was laying in a heap.

I scored 99.9+% on the SAT and LSAT. I think that implies this isn't some reasoning deficit, lack of familiarity with logical reasoning on my end, or lack of rigor in reasoning.

I have no particular interest in this argument. I think that implies that I'm not deploying motivated reasoning, i.e. it discounts the possibility that I may have experienced it as confusion that required re-reading the entire comment 3 times, but perhaps I had subconcious priors.

Would a toddler even understand the question? (serious question, I'm not familiar with 3 year olds)

Does this shed any light on how we'd work an argument along the lines of our deaf and mute friend typing?

Edit: you edited in some more examples, I found it's aubergine answers quite clever! (Ex. notching). I can't parse out a convincing argument this is somehow less knowledge than a 3 year old -- it's giving better answers than me that are physical! I thought youd be sharing it asserting obviously nonphysical answers

replies(1): >>43373296 #

10. kadushka ◴[15 Mar 25 04:15 UTC] No.43369935{5}[source]▶

>>43369634 #

I just entered your prompt into ChatGPT (4.5):

Prompt:

I'm trying to make a rope, made of only aubergines, not using any twine or string. What kind of knot would you recommend to tie together each aubergine with the next? I'm trying to use up as little of the length of each aubergine as possible.

Response:

Your request doesn't entirely make sense. Aubergines (eggplants) are firm but brittle vegetables; they lack flexibility, elasticity, or tensile strength required to create or sustain knots. Attempting to tie aubergines together directly, without using twine, string, or another flexible medium, isn't feasible. They'd likely break or tear apart under stress.

If your goal is genuinely to form a rope-like structure using only aubergines, you'd be much better off slicing the aubergines into thin, flexible strips (perhaps using a vegetable peeler or mandoline), drying or partially dehydrating them to increase flexibility, and then braiding or twisting them together, rather than knotting.

If you clarify your intended purpose or constraints further, I can suggest alternative methods.

The response looks good to me.

11. kedarkhand ◴[15 Mar 25 15:52 UTC] No.43373296{6}[source]▶

>>43369699 #

> I had to read it, and your reaction to the implied obvious reasoning 3 times, to figure out the implied obvious reasoning, and understand your intent was the toddler was standing and the wig was laying in a heap.

It seems quite obvious even on a cursory glance though!

> toddler was standing and the wig was laying in a heap

I mean how would toddler be laying in a heap?

> Would a toddler even understand the question?

Maybe not, I am a teen/early adult myself, so not many children yet :) but if you instead lay those in front of a toddler and ask which is higher, I guess they would answer that, another argument for multi-modality.

PS: Sorry if what I am saying is not clear, english is my third language

12. CamperBob2 ◴[15 Mar 25 22:19 UTC] No.43375505[source]▶

>>43368852 (TP) #

Mice can survive, forage, reproduce. Reproduce a mammal. There is a whole load of capability not available in an LLM.

And if it stood for "Large Literal Mouse", that might be a meaningful point. The subject is artificial intelligence, and a brief glance at your newspaper, TV, or nearest window will remind you that it doesn't take intelligence to survive, forage, or reproduce.

The mouse comparison is absurd. You might as well criticize an LLM for being bad at putting out a fire, fixing a flat, or holding a door open.

13. bubblyworld ◴[16 Mar 25 16:29 UTC] No.43380234{5}[source]▶

>>43369634 #

Hah, I tried it with gpt-4o and got similarly odd results:

https://chatgpt.com/share/67d6fb93-890c-8004-909d-2bb7962c8f...

It's pretty good nonsense though. It suggests clove hitching them together, which would be a weird (and probably unsafe) thing to do even with ropes!

14. pfisch ◴[17 Mar 25 20:44 UTC] No.43392556{5}[source]▶

>>43369171 #

a 3yr old is actually far more similar to AI than an adult. 3 year olds have extremely limited context windows. They will almost immediately forget what happened even 20-30 seconds ago when you play a game like memory with them, and they rarely remember what they ate for breakfast or lunch or basically any previous event from the same day.

When a 3 year old says "I love you" it is not at all clear that they understand what that means. They frequently mimic phrases they hear/basically statistical next word guessing and obviously don't understand the meaning of what they are saying.

You can even mimic an inner voice for them like Deepseek does for thinking through a problem with a 3 year old and it massively helps them to solve problems.

AI largely acts like a 3 year old with a massive corpus of text floating around in their head compared to the much smaller corpus a 3 year old has.

↑