Most active commenters

tonii141(3)

Yann LeCun to depart Meta and launch AI startup focused on 'world models'

(www.nasdaq.com)

Show context

lm28469 ◴[12 Nov 25 08:06 UTC] No.45897524[source]▶

>>45897271 (OP) #

But wait they're just about to get AGI why would he leave???

replies(1): >>45897571 #

killerstorm ◴[12 Nov 25 08:12 UTC] No.45897571[source]▶

>>45897524 #

LeCun always said that LLMs do not lead to AGI.

replies(2): >>45897613 #>>45897683 #

1. NitpickLawyer ◴[12 Nov 25 08:27 UTC] No.45897683[source]▶

>>45897571 #

He also said other things about LLMs that turned out to be either wrong or easily bypassed with some glue. While I understand where he comes from, and that his stance is pure research-y theory driven, at the end of the day his positions were wrong.

Previously, he very publicly and strongly said:

a) LLMs can't do math. They trick us in poetry but that's subjective. They can't do objective math.

b) they can't plan

c) by the very nature of autoregressive arch, errors compound. So the longer you go in your generation, the higher the error rate. so at long contexts the answers become utter garbage.

All of these were proven wrong, 1-2 years later. "a" at the core (gold at IMO), "b" w/ software glue and "c" with better training regimes.

I'm not interested in the will it won't it debates about AGI, I'm happy with what we have now, and I think these things are good enough now, for several usecases. But it's important to note when people making strong claims get them wrong. Again, I think I get where he's coming from, but the public stances aren't the place to get into the deep research minutia.

That being said, I hope he gets to find whatever it is that he's looking for, and wish him success in his endeavours. Between him, Fei Fei Li and Ilya, something cool has to come out of the small shops. Heck, I'm even rooting for the "let's commoditise lora training" that Mira's startup seems to go for.

replies(3): >>45897933 #>>45898169 #>>45905642 #

2. ilaksh ◴[12 Nov 25 09:01 UTC] No.45897933[source]▶

>>45897683 (TP) #

That's true but I also think despite being wrong about the capabilities of LLMs, LeCun has been right in that variations of LLMs are not an appropriate target for long term research that aims to significantly advance AI. Especially at the level of Meta.

I think transformers have been proven to be general purpose, but that doesn't mean that we can't use new fundamental approaches.

To me it's obvious that researchers are acting like sheep as they always do. He's trying to come up with a real innovation.

LeCun has seen how new paradigms have taken over. Variations of LLMs are not the type of new paradigm that serious researches should be aiming for.

I wonder if there can be a unification of spatial-temporal representations and language. I am guessing diffusion video generators already achieve this in some way. But I wonder if new techniques can improve the efficiency and capabilities.

I assume the Nested Learning stuff is pretty relevant.

Although I've never totally grokked transformers and LLMs, I always felt that MoE was the right direction and besides having a strong mapping or unified view of spatial and language info, there also should somehow be the capability of representing information in a non-sequential way. We really use sequences because we can only speak or hear one sound at a time. Information in general isn't particularly sequential, so I doubt that's an ideal representation.

So I guess I am kind of variations of transformers myself to be honest.

But besides being able to convert between sequential discrete representations and less discrete non-sequential representations (maybe you have tokens but every token has a scalar attached), there should be lots of tokenizations, maybe for each expert. Then you have experts that specialize in combining and translating between different scalar-token tokenizations.

Like automatically clustering problems or world model artifacts or something and automatically encoding DSLs for each sub problem.

I wish I really understood machine learning.

3. tonii141 ◴[12 Nov 25 09:35 UTC] No.45898169[source]▶

>>45897683 (TP) #

a) Still true: vanilla LLMs can’t do math, they pattern-match unless you bolt on tools.

b) Still true: next-token prediction isn’t planning.

c) Still true: error accumulation is mitigated, not eliminated. Long-context quality still relies on retrieval, checks, and verifiers.

Yann’s claims were about LLMs as LLMs. With tooling, you can work around limits, but the core point stands.

replies(2): >>45898248 #>>45898683 #

4. NitpickLawyer ◴[12 Nov 25 09:49 UTC] No.45898248[source]▶

>>45898169 #

a) no, gemini 2.5 was shown to "win" gold w/o tools. - https://arxiv.org/html/2507.15855v1

b) reductionism isn't worth our time. Planning works in the real world, today. (try any agentic tool like cc/codex/whatever). And if you're set on the purist view, there's mounting evidence from anthropic that there is planning in the core of an LLM.

c) so ... not true? Long context works today.

This is simply moving goalposts and nothing more. X can't do Y -> well, here they are doing Y -> well, not like that.

replies(1): >>45898433 #

5. tonii141 ◴[12 Nov 25 10:16 UTC] No.45898433{3}[source]▶

>>45898248 #

a) That "no-tools" win depends on prompt orchestration which can still be categorized as tooling.

b) Next-token training doesn’t magically grant inner long-horizon planners..

c) Long context ≠ robust at any length. Degradation with scale remains.

Not moving goalposts, just keeping terms precise.

replies(1): >>45899019 #

6. killerstorm ◴[12 Nov 25 10:59 UTC] No.45898683[source]▶

>>45898169 #

My man, math is pattern matching, not magic. So is logic. And computation.

Please learn the basics before you discuss what LLMs can and can't do.

replies(1): >>45899359 #

7. ACCount37 ◴[12 Nov 25 11:51 UTC] No.45899019{4}[source]▶

>>45898433 #

My man, you're literally moving all the goalposts as we speak.

It's not just "long context" - you demand "infinite context" and "any length" now. Even humans don't have that. "No tools" is no longer enough - what, do you demand "no prompts" now too? Having LLMs decompose tasks and prompt each other the way humans do is suddenly a no-no?

replies(1): >>45899469 #

8. ozgrakkurt ◴[12 Nov 25 12:31 UTC] No.45899359{3}[source]▶

>>45898683 #

I'm no expert on math but "math is pattern matching" really sounds wrong.

Maybe programming is mostly pattern matching but modern math is built on theory and proofs right?

replies(2): >>45900035 #>>45905753 #

9. tonii141 ◴[12 Nov 25 12:45 UTC] No.45899469{5}[source]▶

>>45899019 #

I’m not demanding anything, I’m pointing out that performance tends to degrade as context scales, which follows from current LLM architectures as autoregressive models.

In that sense, Yann was right.

replies(1): >>45901699 #

10. noddybear ◴[12 Nov 25 13:40 UTC] No.45900035{4}[source]▶

>>45899359 #

Nah, its all pattern matching. This is how automated theorem provers like Isabelle are built, applying operations to lemmas/expressions to reach proofs.

replies(2): >>45900776 #>>45901563 #

11. staticman2 ◴[12 Nov 25 14:38 UTC] No.45900776{5}[source]▶

>>45900035 #

I'm sure if you pick a sufficiently broad definition of pattern matching your argument is true by definition!

Unfortunately that has nothing to do with the topic of discussions, which is the capabilities of LLMs, which may require a more narrow definition of pattern matching.

12. vbarrielle ◴[12 Nov 25 15:44 UTC] No.45901563{5}[source]▶

>>45900035 #

Automated theorem provers are also built around backtracking, which is absent in LLMs.

13. snapcaster ◴[12 Nov 25 15:55 UTC] No.45901699{6}[source]▶

>>45899469 #

Not sure if you're just someone who doesn't want to ever lose an argument or you're actually coping this hard

14. HarHarVeryFunny ◴[12 Nov 25 20:04 UTC] No.45905642[source]▶

>>45897683 (TP) #

> So the longer you go in your generation, the higher the error rate. so at long contexts the answers become utter garbage.

Not totally wrong. They can self-correct, but it seems context rot will eventually set in.

15. HarHarVeryFunny ◴[12 Nov 25 20:09 UTC] No.45905753{4}[source]▶

>>45899359 #

When an LLM does it, it's pattern matching.

RL training amounts to pattern matching.

How does an LLM decode Base64? Decode algorithm? No - predictive pattern matching.

An LLM isn't predicting what a person thinks - it's predicting what a person does.

↑