←back to thread

124 points alphadelphi | 3 comments | | HN request time: 0.623s | source
Show context
antirez ◴[] No.43594641[source]
As LLMs do things thought to be impossible before, LeCun adjusts his statements about LLMs, but at the same time his credibility goes lower and lower. He started saying that LLMs were just predicting words using a probabilistic model, like a better Markov Chain, basically. It was already pretty clear that this was not the case as even GPT3 could do summarization well enough, and there is no probabilistic link between the words of a text and the gist of the content, still he was saying that at the time of GPT3.5 I believe. Then he adjusted this vision when talking with Hinton publicly, saying "I don't deny there is more than just probabilistic thing...". He started saying: not longer just simply probabilistic but they can only regurgitate things they saw in the training set, often explicitly telling people that novel questions could NEVER solved by LLMs, with examples of prompts failing at the time he was saying that and so forth. Now reasoning models can solve problems they never saw, and o3 did huge progresses on ARC, so he adjusted again: for AGI we will need more. And so forth.

So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea. Add to this that LeCun is directing an AI lab that as the same point has the following huge issues:

1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).

2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.

3. LLMs and in general all the new AI wave puts CNNs, a field where LeCun worked (but that didn't started himself) a lot more in perspective, and now it's just a chapter in a book that is composed mostly of other techniques.

Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is stil regressive next token without any architectural change, so, no, they were wrong, too.

replies(15): >>43594669 #>>43594733 #>>43594747 #>>43594812 #>>43594852 #>>43595292 #>>43595501 #>>43595519 #>>43595562 #>>43595668 #>>43596291 #>>43596309 #>>43597354 #>>43597435 #>>43614487 #
belter ◴[] No.43595292[source]
But have we established that LLMs dont just interpolate and they can create?

Are we able to prove it with output that's

1) algorithmically novel (not just a recombination)

2) coherent, and

3) not explainable by training data coverage.

No handwaving with scale...

replies(1): >>43595381 #
fragmede ◴[] No.43595381[source]
Why is that the bar though? Imagine LLMs as a kid that has a box of lego with a hundred million blocks in it, and it can assemble those blocks into any configuration possible. Is the fact that the kid doesn't have access to ABS plastic pellets and a molding machine, and so they can't make new pieces; does that really make us think that the kid just interpolates and can't create?
replies(1): >>43595416 #
belter ◴[] No.43595416[source]
Actually yes...If the kid spends their whole life in the box and never invents a new block, that’s just combinatorics. We don’t call a chess engine ‘creative’ for finding novel moves, because we understand the rules. LLMs have rules too, they’re called weights.

I want LLMs to create, but so far, every creative output I’ve seen is just a clever remix of training data. The most advanced models still fail a simple test: Restrict the domain, for example, "invent a cookie recipe with no flour, sugar, or eggs" or "name a company without using real words". Suddenly, their creativity collapses into either, nonsense (violating constraints), or trivial recombination, ChocoNutBake instead of NutellaCookie.

If LLMs could actually create, we’d see emergent novelty, outputs that couldn’t exist in the training data. Instead, we get constrained interpolation.

Happy to be proven wrong. Would like to see examples where an LLM output is impossible to map back to its training data.

replies(1): >>43595466 #
1. fragmede ◴[] No.43595466[source]
The combinatorics on choosing 500 pieces (words) out of a bag of 1.8 million pieces (approx parameters per layer for GPT-3) with replacement, and order matters works out to be something like 10^4600. Maybe you can't call that creativity, but you've got to admit that's a pretty big number.
replies(1): >>43595543 #
2. belter ◴[] No.43595543[source]
I said No handwaving with scale. :-)
replies(1): >>43595861 #
3. fragmede ◴[] No.43595861[source]
Right—but why should “new ABS plastic” be the bar for creativity? If the kid builds a structure no one’s ever imagined, from an unimaginably large box of Lego, isn’t that still novel? Sure, it’s made from known parts—but so is language. So are most human ideas.

The demand for outputs that are provably untraceable to training data feels like asking for magic, not creativity. Even Gödel didn’t require “never seen before atoms” to demonstrate emergence.