←back to thread

124 points alphadelphi | 2 comments | | HN request time: 0.002s | source
Show context
antirez ◴[] No.43594641[source]
As LLMs do things thought to be impossible before, LeCun adjusts his statements about LLMs, but at the same time his credibility goes lower and lower. He started saying that LLMs were just predicting words using a probabilistic model, like a better Markov Chain, basically. It was already pretty clear that this was not the case as even GPT3 could do summarization well enough, and there is no probabilistic link between the words of a text and the gist of the content, still he was saying that at the time of GPT3.5 I believe. Then he adjusted this vision when talking with Hinton publicly, saying "I don't deny there is more than just probabilistic thing...". He started saying: not longer just simply probabilistic but they can only regurgitate things they saw in the training set, often explicitly telling people that novel questions could NEVER solved by LLMs, with examples of prompts failing at the time he was saying that and so forth. Now reasoning models can solve problems they never saw, and o3 did huge progresses on ARC, so he adjusted again: for AGI we will need more. And so forth.

So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea. Add to this that LeCun is directing an AI lab that as the same point has the following huge issues:

1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).

2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.

3. LLMs and in general all the new AI wave puts CNNs, a field where LeCun worked (but that didn't started himself) a lot more in perspective, and now it's just a chapter in a book that is composed mostly of other techniques.

Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is stil regressive next token without any architectural change, so, no, they were wrong, too.

replies(15): >>43594669 #>>43594733 #>>43594747 #>>43594812 #>>43594852 #>>43595292 #>>43595501 #>>43595519 #>>43595562 #>>43595668 #>>43596291 #>>43596309 #>>43597354 #>>43597435 #>>43614487 #
gcr ◴[] No.43594669[source]
Why is changing one’s mind when confronted with new evidence a negative signifier of reputation for you?
replies(6): >>43594696 #>>43594815 #>>43594919 #>>43595008 #>>43595180 #>>43595298 #
antirez ◴[] No.43594696[source]
Because there were plenty of evidences that the statements were either not correct or not based on enough information, at the time they were made. And to be wrong because of personal biases, and then don't clearly state you were wrong when new evidenced appeared, is not a trait of a good scientist. For instance: the strong summarization abilities where already something that, alone, without any further information, were enough to seriously doubt about the stochastic parrot mental model.
replies(4): >>43594725 #>>43594765 #>>43594771 #>>43595670 #
jxjnskkzxxhx ◴[] No.43594765[source]
I don't see the contradiction between "stochastic parrot" and "strong summarisation abilities".

Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.

replies(2): >>43594813 #>>43595232 #
antirez ◴[] No.43594813{3}[source]
LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1). So they don't learn probabilities, they learn how to continue the sentence with a given token. We apply softmax at the logits for mathematical reasons, and it is natural/simpler to think in terms of probabilities, but that's not what happens, nor the neural networks they are composed of is just able to approximate probabilistic functions. This "next token" probability is the source of a lot misunderstanding. It's much better to imagine the logits as "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." and so forth. Now there are evidences, too, that in the activations producing a given token the LLM already has an idea about how most of the sentence is going to continue.

Of course, as they learn, early in the training, the first functions they will model, to lower the error, will start being the probabilities of the next tokens, since this is the simplest function that works for the loss reduction. Then gradients agree in other directions, and the function that the LLM eventually learn is no longer related to probabilities, but to the meaning of the sentence and what it makes sense to say next.

It's not be chance that often the logits have a huge signal in just two or three tokens, even if the sentence, probabilistically speaking, could continue in much more potential ways.

replies(4): >>43594882 #>>43594975 #>>43595199 #>>43595490 #
jxjnskkzxxhx ◴[] No.43594882{4}[source]
I don't think the difference is material, between "they learn probabilities" Vs "they learn how they want a sentence to continue". Seems like an implementation detail to me. In fact, you can add a temperature, set it to zero, and you become deterministic, so no probabilities anywhere. The fact is, they learn from examples of sequences and are very good at finding patterns in those sequences, to a point that they "sound human".

But the point of my response was just that I find it an extremely surprising how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human, and I'm suspicious of anyone who pretends this isn't incredible. Can we agree on this?

replies(1): >>43595549 #
balamatom ◴[] No.43595549{5}[source]
I don't find anything surprising about that. What humans generally see of each other is little more than outer shells that are made out of sequenced linguistic patterns. They generally find that completely sufficient.

(All things considered, you may be right to be suspicious of me.)

replies(1): >>43596351 #
1. jxjnskkzxxhx ◴[] No.43596351{6}[source]
Nah, to me you're just an average person on the internet. If the recent developments don't surprise you, I just chalk it up to lack of curiosity. I'm well aware that people like you exist, most people are like that in fact. My comment was referring to experts specifically.
replies(1): >>43597036 #
2. balamatom ◴[] No.43597036[source]
>how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human

What surprises me is the assumption that there's more than "find patterns in sequences" to "sounding human" i.e. to emitting human-like communication patterns. What else could there be to it? It's a tautology.

>If the recent developments don't surprise you, I just chalk it up to lack of curiosity.

Recent developments don't surprise me in the least. I am, however, curious enough to be absolutely terrified by them. For one, behind the human-shaped communication sequences there could previously be assumed to be an actual human.