Most active commenters
  • Lerc(6)
  • littlestymaar(3)

←back to thread

385 points vessenes | 19 comments | | HN request time: 2.741s | source | bottom

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context
ActorNightly ◴[] No.43325670[source]
Not an official ML researcher, but I do happen to understand this stuff.

The problem with LLMs is that the output is inherently stochastic - i.e there isn't a "I don't have enough information" option. This is due to the fact that LLMs are basically just giant look up maps with interpolation.

Energy minimization is more of an abstract approach to where you can use architectures that don't rely on things like differentiability. True AI won't be solely feedforward architectures like current LLMs. To give an answer, they will basically determine alogrithm on the fly that includes computation and search. To learn that algorithm (or algorithm parameters), at training time, you need something that doesn't rely on continuous values, but still converges to the right answer. So instead you assign a fitness score, like memory use or compute cycles, and differentiate based on that. This is basically how search works with genetic algorithms or PSO.

replies(10): >>43365410 #>>43366234 #>>43366675 #>>43366830 #>>43366868 #>>43366901 #>>43366902 #>>43366953 #>>43368585 #>>43368625 #
seanhunter ◴[] No.43365410[source]
> The problem with LLMs is that the output is inherently stochastic - i.e there isn't a "I don't have enough information" option. This is due to the fact that LLMs are basically just giant look up maps with interpolation.

I don't think this explanation is correct. The input to the decoder at the end of all the attention heads etc (as I understand it) is a probability distribution over tokens. So the model as a whole does have an ability to score low confidence in something by assigning it a low probability.

The problem is that thing is a token (part of a word). So the LLM can say "I don't have enough information" to decide on the next part of a word but has no ability to say "I don't know what on earth I'm talking about" (in general - not associated with a particular token).

replies(5): >>43365608 #>>43365655 #>>43365953 #>>43366351 #>>43366485 #
1. Lerc ◴[] No.43366485[source]
I feel like we're stacking naive misinterpretations of how LLMs function on top of one another here. Grasping gradient descent and autoregressive generation can give you a false sense of confidence. It is like knowing how transistors make up logic gates and believing you know more than CPU design than you actually do.

Rather than inferring from how you imagine the architecture working, you can look at examples and counterexamples to see what capabilities they have.

One misconception is that predicting the next word means there is no internal idea on the word after next. The simple disproof of this is that models put 'an' instead of 'a' ahead of words beginning with vowels. It would be quite easy to detect (and exploit) behaviour that decided to use a vowel word just because it somewhat arbitrarily used an 'an'.

Models predict the next word, but they don't just predict the next word. They generate a great deal of internal information in service of that goal. Placing limits on their abilities by assuming the output they express is the sum total of what they have done is a mistake. The output probability is not what it thinks, it is a reduction of what it thinks.

One of Andrej Karpathy's recent videos talked about how researchers showed that models do have an internal sense of not knowing the answer, but fine tuning on question answering I'd not give them the ability to express that knowledge. Finding information the model did and didn't know then fine tuning to say I don't know for cases where it had no information allowed the model to generalise and express "I don't know"

replies(6): >>43366739 #>>43367815 #>>43367895 #>>43368796 #>>43371175 #>>43373293 #
2. littlestymaar ◴[] No.43366739[source]
No an ML researcher or anything (I'm basically only a few Karpathy video into ML, so please someone correct me if I'm misunderstanding this), but it seems that you're getting this backwards:

> One misconception is that predicting the next word means there is no internal idea on the word after next. The simple disproof of this is that models put 'an' instead of 'a' ahead of words beginning with vowels.

My understanding is that there's simply not “'an' ahead of a word that starts with a vowel”, the model (or more accurately, the sampler) picks “an” and then the model will never predict a word that starts with a consonant after that. It's not like it “knows” in advance that it wants to put a word with a vowel and then anticipates that it needs to put “an”, it generates a probability for both tokens “a” and “an”, picks one, and then when it generates the following token, it will necessarily take its previous choice into account and never puts a word starting with a vowel after it has already chosen “a”.

replies(3): >>43367069 #>>43368302 #>>43377625 #
3. yunwal ◴[] No.43367069[source]
The model still has some representation of whether the word after an/a is more likely to start with a vowel or not when it outputs a/an. You can trivially understand this is true by asking LLMs to answer questions with only one correct answer.

"The animal most similar to a crocodile is:"

https://chatgpt.com/share/67d493c2-f28c-8010-82f7-0b60117ab2...

It will always say "an alligator". It chooses "an" because somewhere in the next word predictor it has already figured out that it wants to say alligator when it chooses "an".

If you ask the question the other way around, it will always answer "a crocodile" for the same reason.

replies(1): >>43367196 #
4. littlestymaar ◴[] No.43367196{3}[source]
Again, that's not a good example I think because everything about the answer is in the prompt, so obviously from the start the "alligator" is high, but then it's just waiting for an "an" to occur to have an occasion to put that.

That doesn't mean it knows "in advance" what it want to say, it's just that at every step the alligator is lurking in the logits because it directly derives from the prompt.

replies(1): >>43367750 #
5. metaxz ◴[] No.43367750{4}[source]
You write: "it's just that at every step the alligator is lurking in the logits because it directly derives from the prompt" - but isn't that the whole point: at the moment the model writes "an", it isn't just spitting out a random article (or a 50/50 distribution of articles or other words for that matter); rather, "an" gets a high probability because the model internally knows that "alligator" is the correct thing after that. While it can only emit one token in this step, it will emit "an" to make it consistent with its alligator knowledge "lurking". And btw while not even directly relevant, the word alligator isn't in the prompt. Sure, it derives from the prompt but so does every an LLM generates, and same for any other AI mechanism for generating answers.
replies(1): >>43369344 #
6. metaxz ◴[] No.43367815[source]
Thanks for writing this so clearly... I hear wrong/misguided arguments like we see hear every day from friends, colleagues, "experts in the media" etc.

It's strange because just a moment of thinking will show that such ideas are wrong or paint a clearly incomplete picture. And there's plenty of analogies to the dangers of such reductionism. It should be obviously wrong to anyone who has at least tried ChatGPT.

My only explanation is that a denial mechanism must be at play. It simply feels more comfortable to diminish LLM capabilities and/or feel that you understand them from reading a Medium article on transformer-network, than to consider the consequences in terms of the inner black-box nature.

7. ◴[] No.43367895[source]
8. Lerc ◴[] No.43368302[source]
yunwal has provided one example. Here's another using much smaller model.

https://chat.groq.com/?prompt=If+a+person+from+Ontario+or+To...

The response "If a person from Ontario or Toronto is a Canadian, a person from Sydney or Melbourne would be an Australian!"

It seems mighty unlikely that it chose Australian as the country because of the 'an', or that it chose to put the 'an' at that point in the sentence for any other reason that the word Australian was going to be next.

For any argument that you think that this does not mean that have some idea of what is to come, try and come up with a test to see if your hypothesis is true or not, then give that test a try.

9. jkhdigital ◴[] No.43368796[source]
I think your analogy about logic gates vs. CPUs is spot on. Another apt analogy would be missing the forest for the trees—the model may in fact be generating a complete forest, but its output (natural language) is inherently serial so it can only plant one tree at a time. The sequence of distributions that is the proximate driver of token selection is just the final distillation step.
10. littlestymaar ◴[] No.43369344{5}[source]
> While it can only emit one token in this step, it will emit "an" to make it consistent with its alligator knowledge "lurking".

It will also emit "a" from time to time without issue though, but will never spit "alligator" right after that, that's it.

> Sure, it derives from the prompt but so does every an LLM generates, and same for any other AI mechanism for generating answers.

Not really, because of the autoregressive nature of LLMs, the longer the response the more it will depend on its own response rather than the prompt. That's why you can see totally opposite response from LLM to the same query if you aren't asking basic factual questions. I saw a tool on reddit a few month ago that allowed you to see which words in the generation where the most “opinionated” (where the sampler had to chose between alternative words that were close in probability) and where it was easy to see that you could dramatically affect the result by just changing certain words.

> "an" gets a high probability because the model internally knows that "alligator" is the correct thing after that.

This is true, though it only works with this kind of prompt because the output of the LLM has little impact on the generation.

Globally I see what you mean, and I don't disagree with you, but at the same time, I think that saying that LLMs have a sense of anticipating the further token misses their ability to get driven astray by their own output: they have some information that will affect further tokens but any token that get spit can, and will, change that information in a way that can dramatically change the “plans”. And that's why I think using trivial questions isn't a good illustration, because it pushes this effect under the rug.

11. flamedoge ◴[] No.43371175[source]
It literally doesn't know how to handle 'I don't know' and needs to be taught. fascinating.
replies(1): >>43371338 #
12. Lerc ◴[] No.43371338[source]
I think it would be more accurate to say that after fine tuning on a series of questions with answers that it thinks that you don't want to hear "I don't know"
replies(1): >>43374524 #
13. cruffle_duffle ◴[] No.43373293[source]
> It would be quite easy to detect (and exploit) behaviour that decided to use a vowel word just because it somewhat arbitrarily used an 'an'.

That is a very interesting observation!

Doesn’t that internal state get blown away and recreated for every “next token”? Isn’t the output always the previous context plus the new token, which gets fed back and out pops the new token? There is no transfer of internal state to the new iteration beyond what is “encoded” in its input tokens?

replies(1): >>43373930 #
14. Lerc ◴[] No.43373930[source]
>Doesn’t that internal state get blown away and recreated for every “next token”

That is correct. When a model has a good idea of the next 5 words, after it has emitted the first of those 5 most architectures make no further use of the other 4 and regenerate likely the same information again in the next inference cycle.

There are architectures that don't discard all that information but the standard LLM has generally outperformed them, for now.

There are interesting philosophical implications if LLMs were to advance to a level to be considered sentient. Would it not be constantly creating and killing a thinking being for every token. On the other hand if context is considered memory, perhaps continuity of identity is based upon memory and all that other information are simply forgotten idle thoughts. We have no concept of what our previous thoughts were except from our memory. Is that not the same.

Sometimes I wonder if some of the resistance to AI is because it can do things that we think requires abilities that we would like to believe that we possess ourselves, and showing that they are not necessary creates the possibility that we might not have have those abilities.

There was a great observation recently in an interview (I forget the source, but the interviewer's last name was Bi) that some of the discoveries that met the most resistance in history such as the Earth orbiting the Sun, or Darwin's theory of evolution were similar in that they implied that we are not a unique special case.

15. kerkeslager ◴[] No.43374524{3}[source]
I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily.

If the training data contained a bunch of answers to questions which were simply "I don't know", you could get an LLM to say "I don't know" but that's still not actually a concept of not knowing. That's just knowing that the answer to your question is "I don't know".

It's essentially like if you had an HTTP server that responded to requests for nonexistent documents with a "200 OK" containing "Not found". It's fundamentally missing the "404 Not found" concept.

LLMs just have a bunch of words--they don't understand what the words mean. There's no metacognition going on for it to think "I don't know" for it to even think you would want to know that.

replies(1): >>43375507 #
16. Lerc ◴[] No.43375507{4}[source]
>I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily.

I'm not sure if this objection is terribly helpful. We use terms like think and want to describe processes that are clearly not involve any form of understanding. Electrons do not have motivations but they 'want' to go to a lower energy level in an atom. You can hold down the trigger for the fridge light to make it 'think' that the door has not been opened. These are uncontentious phrases that convey useful ideas.

I understand that when people are working towards producing reasoning machines the words might be working in similar spaces, but really when someone is making claims about machines having awareness, understanding, or thinking they make it quite clear about the context that they are talking about.

As to the rest of your comment, I simply disagree. If you think of a concept of an internal representation of a piece of information, then it has been shown that they do have such representations. In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know. This generalised to provide "I don't know" for examples that were not in the training data. For the fine tuning examples to succeed in that, it requires the model to already contain the concept.

I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.

I think that the models can express details about words shows that they do have detailed information about what each word means semantically. In many respects because of tokenisation indexing embeddings it would perhaps be more accurate to say that they have a better understanding of the semantic information of what words mean the what the words actually are. This is why they are poor at spelling but can give you detailed information about the thing they can't spell.

replies(1): >>43381331 #
17. numeri ◴[] No.43377625[source]
No, the person you're responding to is absolutely right. The easy test (which has been done in papers again and again) is the ability to train linear probes (or non-linear classifier heads) on the current hidden representations to predict the nth-next token, and the fact that these probes have very high accuracy.
18. kerkeslager ◴[] No.43381331{5}[source]
> We use terms like think and want to describe processes that are clearly not involve any form of understanding.

...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.

> In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know.

This is why I included the HTTP example: this is simply telling it to parrot the phrase "I don't know"--it doesn't understand that it doesn't know. From the LLM's perpective, it "knows" that the answer is "I don't know". It's returning a 200 OK that says "I don't know" rather than returning a 404.

Do you understand the distinction I'm making here?

> I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.

The average (non-programmer) human, when asked to write a "Hello, world" program, can definitely say they don't know how to program. And unlike the LLM, the human knows that this is different from answering the question. The LLM, in contrast thinks it is answering the question when it says "I don't know"--it thinks "I don't know" is the correct answer.

Put another way, a human can distinguish between responses to these two questions, whereas an LLM can't:

1. What is my grandmother's maiden name?

2. What is the English translation of the Spanish phrase, "No sé."?

In the first question, you don't know the answer unless you are quite creepy; in the second case you do (or can find out easily). But the LLM tuned to answer I don't know thinks it knows the answer in both cases, and thinks the answer is the same.

replies(1): >>43385383 #
19. Lerc ◴[] No.43385383{6}[source]
>...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.

There is a difference between explanation by metaphor and lack of precision. If you think someone is implying something literal when they might be using a metaphor you can always ask for clarification. I know plenty of people that are utterly precise in their use in their language which leads them to being widely misunderstood because they think a weak precise signal is received as clearly as a strong imprecise signal. They usually think the failure in communication is in the recipient but in reality they are just accurately using the wrong protocol.

>Do you understand the distinction I'm making here? I believe I do, and it is precisely this distinction that the researches showed. By teaching a model to say "I don't know" for some information that they knew the model did not know the answer to, the model learned to respond "I don't know" for things that it did not know that it was not explicitly taught to respond with "I don't know". For it to acquire that ability to generalise to new cases the model has to have already had an internal representation of "That information is not available"

I'm not sure where you think a model converting its internal representation of not knowing something into words is distinct from a human converting its internal representation of not knowing into words.

When fine tuning directs a model to profess lack of knowledge, usually they will not give the same specific "I don't know" text as a way to express that it does not not know because they want the want to bind the concept "lack of knowledge" to the concept of "communicate that I do not know" rather than any particular word phrase. Giving it many ways to say "I don't know" builds that binding rather than the crude "if X then emit Y" that you imagine it to be.