←back to thread

124 points alphadelphi | 3 comments | | HN request time: 0.001s | source
Show context
antirez ◴[] No.43594641[source]
As LLMs do things thought to be impossible before, LeCun adjusts his statements about LLMs, but at the same time his credibility goes lower and lower. He started saying that LLMs were just predicting words using a probabilistic model, like a better Markov Chain, basically. It was already pretty clear that this was not the case as even GPT3 could do summarization well enough, and there is no probabilistic link between the words of a text and the gist of the content, still he was saying that at the time of GPT3.5 I believe. Then he adjusted this vision when talking with Hinton publicly, saying "I don't deny there is more than just probabilistic thing...". He started saying: not longer just simply probabilistic but they can only regurgitate things they saw in the training set, often explicitly telling people that novel questions could NEVER solved by LLMs, with examples of prompts failing at the time he was saying that and so forth. Now reasoning models can solve problems they never saw, and o3 did huge progresses on ARC, so he adjusted again: for AGI we will need more. And so forth.

So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea. Add to this that LeCun is directing an AI lab that as the same point has the following huge issues:

1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).

2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.

3. LLMs and in general all the new AI wave puts CNNs, a field where LeCun worked (but that didn't started himself) a lot more in perspective, and now it's just a chapter in a book that is composed mostly of other techniques.

Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is stil regressive next token without any architectural change, so, no, they were wrong, too.

replies(15): >>43594669 #>>43594733 #>>43594747 #>>43594812 #>>43594852 #>>43595292 #>>43595501 #>>43595519 #>>43595562 #>>43595668 #>>43596291 #>>43596309 #>>43597354 #>>43597435 #>>43614487 #
gcr ◴[] No.43594669[source]
Why is changing one’s mind when confronted with new evidence a negative signifier of reputation for you?
replies(6): >>43594696 #>>43594815 #>>43594919 #>>43595008 #>>43595180 #>>43595298 #
danielmarkbruce ◴[] No.43595008[source]
If you need basically rock solid evidence of X before you stop saying "this thing cannot do X", then you shouldn't be running a forward looking lab. There are only so many directions you can take, only so many resources at your disposal. Your intuition has to be really freakishly good to be running such a lab.

He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.

replies(2): >>43595040 #>>43596502 #
nurettin ◴[] No.43596502[source]
I'm going to wear the tinfoil hat: a firm is able to produce a sought-after behavior a few months later and throws people off. Is it more likely that the firm (worth billions at this point) is engineering these solutions into the model, or is it because of emergent neural network architectural magic?

I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.

replies(1): >>43597844 #
1. danielmarkbruce ◴[] No.43597844[source]
It depends what you mean by "engineering these solutions into the model". Using better data leads to better models given the same architecture and training. Nothing wrong with it, it's hard work, it might be with as specific goal in mind. LLM "breakthroughs" aren't really a thing at this point. It's just one little thing after another.
replies(1): >>43598430 #
2. nurettin ◴[] No.43598430[source]
Sure, I specifically pre-agreed to it not being ill will. What I mean is keeping tabs on the latest demand (newer benchmarks) and making sure their model delivers in some fashion. But it is mundane and they don't say that. And when a major number increases, people don't assume they just added more specific training data.
replies(1): >>43603251 #
3. danielmarkbruce ◴[] No.43603251[source]
Yup, it's a fair point. We very quickly got down to the nitty gritty with these things. Hopefully, like semiconductors nitty gritty results in a lot of big performance gains for decades.