Most active commenters
  • Lerc(6)
  • vessenes(3)
  • littlestymaar(3)
  • QuesnayJr(3)

←back to thread

385 points vessenes | 48 comments | | HN request time: 2.07s | source | bottom

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

1. ActorNightly ◴[] No.43325670[source]
Not an official ML researcher, but I do happen to understand this stuff.

The problem with LLMs is that the output is inherently stochastic - i.e there isn't a "I don't have enough information" option. This is due to the fact that LLMs are basically just giant look up maps with interpolation.

Energy minimization is more of an abstract approach to where you can use architectures that don't rely on things like differentiability. True AI won't be solely feedforward architectures like current LLMs. To give an answer, they will basically determine alogrithm on the fly that includes computation and search. To learn that algorithm (or algorithm parameters), at training time, you need something that doesn't rely on continuous values, but still converges to the right answer. So instead you assign a fitness score, like memory use or compute cycles, and differentiate based on that. This is basically how search works with genetic algorithms or PSO.

replies(10): >>43365410 #>>43366234 #>>43366675 #>>43366830 #>>43366868 #>>43366901 #>>43366902 #>>43366953 #>>43368585 #>>43368625 #
2. seanhunter ◴[] No.43365410[source]
> The problem with LLMs is that the output is inherently stochastic - i.e there isn't a "I don't have enough information" option. This is due to the fact that LLMs are basically just giant look up maps with interpolation.

I don't think this explanation is correct. The input to the decoder at the end of all the attention heads etc (as I understand it) is a probability distribution over tokens. So the model as a whole does have an ability to score low confidence in something by assigning it a low probability.

The problem is that thing is a token (part of a word). So the LLM can say "I don't have enough information" to decide on the next part of a word but has no ability to say "I don't know what on earth I'm talking about" (in general - not associated with a particular token).

replies(5): >>43365608 #>>43365655 #>>43365953 #>>43366351 #>>43366485 #
3. estebarb ◴[] No.43365608[source]
The problem is exactly that: the probability distribution. The network has no way to say: 0% everyone, this is non sense, backtrack everything.

Other architectures, like energy based models or bayesian ones can assess uncertainty. Transformers simply cannot do it (yet). Yes, there are ways to do it, but we are already spending millions to get coherent phrases, few ones will burn billions to train a model that can do that kind of assessments.

replies(1): >>43365684 #
4. duskwuff ◴[] No.43365655[source]
Right. And, as a result, low token-level confidence can end up indicating "there are other ways this could have been worded" or "there are other topics which could have been mentioned here" just as often as it does "this output is factually incorrect". Possibly even more often, in fact.
replies(1): >>43365813 #
5. ortsa ◴[] No.43365684{3}[source]
Has anybody ever messed with adding a "backspace" token?
replies(1): >>43365782 #
6. refulgentis ◴[] No.43365782{4}[source]
Yes. (https://news.ycombinator.com/item?id=36425375, believe there's been more)

There's a quite intense backlog of new stuff that hasn't made it to prod. (I would have told you in 2023 that we would have ex. switched to Mamba-like architectures in at least one leading model)

Broadly, it's probably unhelpful that:

- absolutely no one wants the PR of releasing a model that isn't competitive with the latest peers

- absolutely everyone wants to release an incremental improvement, yesterday

- Entities with no PR constraint, and no revenue repurcussions when reallocating funds from surely-productive to experimental, don't show a significant improvement in results for the new things they try (I'm thinking of ex. Allen Institute)

Another odd property I can't quite wrap my head around is the battlefield is littered with corpses that eval okay-ish, and should have OOM increases in some areas (I'm thinking of RWKV, and how it should be faster at inference), and they're not really in the conversation either.

Makes me think either A) I'm getting old and don't really understand ML from a technical perspective anyway or B) hey, I 've been maintaining a llama.cpp wrapper that works on every platform for a year now, I should trust my instincts: the real story is UX is king and none of these things actually improve the experience of a user even if benchmarks are ~=.

replies(2): >>43365962 #>>43367533 #
7. vessenes ◴[] No.43365813{3}[source]
My first reaction is that a model can’t, but a sampling architecture probably could. I’m trying to understand if what we have as a whole architecture for most inference now is responsive to the critique or not.
8. derefr ◴[] No.43365953[source]
You get scores for the outputs of the last layer; so in theory, you could notice when those scores form a particularly flat distribution, and fault.

What you can't currently get, from a (linear) Transformer, is a way to induce a similar observable "fault" in any of the hidden layers. Each hidden layer only speaks the "language" of the next layer after it, so there's no clear way to program an inference-framework-level observer side-channel that can examine the output vector of each layer and say "yup, it has no confidence in any of what it's doing at this point; everything done by layers feeding from this one will just be pareidolia — promoting meaningless deviations from the random-noise output of this layer into increasing significance."

You could in theory build a model as a Transformer-like model in a sort of pine-cone shape, where each layer feeds its output both to the next layer (where the final layer's output is measured and backpropped during training) and to an "introspection layer" that emits a single confidence score (a 1-vector). You start with a pre-trained linear Transformer base model, with fresh random-weighted introspection layers attached. Then you do supervised training of (prompt, response, confidence) triples, where on each training step, the minimum confidence score of all introspection layers becomes the controlled variable tested against the training data. (So you aren't trying to enforce that any particular layer notice when it's not confident, thus coercing the model to "do that check" at that layer; you just enforce that a "vote of no confidence" comes either from somewhere within the model, or nowhere within the model, at each pass.)

This seems like a hack designed just to compensate for this one inadequacy, though; it doesn't seem like it would generalize to helping with anything else. Some other architecture might be able to provide a fully-general solution to enforcing these kinds of global constraints.

(Also, it's not clear at all, for such training, "when" during the generation of a response sequence you should expect to see the vote-of-no-confidence crop up — and whether it would be tenable to force the model to "notice" its non-confidence earlier in a response-sequence-generating loop rather than later. I would guess that a model trained in this way would either explicitly evaluate its own confidence with some self-talk before proceeding [if its base model were trained as a thinking model]; or it would encode hidden thinking state to itself in the form of word-choices et al, gradually resolving its confidence as it goes. In neither case do you really want to "rush" that deliberation process; it'd probably just corrupt it.)

9. vessenes ◴[] No.43365962{5}[source]
For sure read Stephenson’s essay on path dependence; it lays out a lot of these economic and social dynamics. TLDR - we will need a major improvement to see something novel pick up steam most likely.
replies(1): >>43367515 #
10. throw310822 ◴[] No.43366234[source]
> there isn't a "I don't have enough information" option. This is due to the fact that LLMs are basically just giant look up maps with interpolation.

Have you ever tried telling ChatGPT that you're "in the city centre" and asking it if you need to turn left or right to reach some landmark? It will not answer with the average of the directions given to everybody who asked the question before, it will answer asking you to tell it where you are precisely and which way you are facing.

replies(1): >>43369776 #
11. skybrian ◴[] No.43366351[source]
I think some “reasoning” models do backtracking by inserting “But wait” at the start of a new paragraph? There’s more to it, but that seems like a pretty good trick.
12. Lerc ◴[] No.43366485[source]
I feel like we're stacking naive misinterpretations of how LLMs function on top of one another here. Grasping gradient descent and autoregressive generation can give you a false sense of confidence. It is like knowing how transistors make up logic gates and believing you know more than CPU design than you actually do.

Rather than inferring from how you imagine the architecture working, you can look at examples and counterexamples to see what capabilities they have.

One misconception is that predicting the next word means there is no internal idea on the word after next. The simple disproof of this is that models put 'an' instead of 'a' ahead of words beginning with vowels. It would be quite easy to detect (and exploit) behaviour that decided to use a vowel word just because it somewhat arbitrarily used an 'an'.

Models predict the next word, but they don't just predict the next word. They generate a great deal of internal information in service of that goal. Placing limits on their abilities by assuming the output they express is the sum total of what they have done is a mistake. The output probability is not what it thinks, it is a reduction of what it thinks.

One of Andrej Karpathy's recent videos talked about how researchers showed that models do have an internal sense of not knowing the answer, but fine tuning on question answering I'd not give them the ability to express that knowledge. Finding information the model did and didn't know then fine tuning to say I don't know for cases where it had no information allowed the model to generalise and express "I don't know"

replies(6): >>43366739 #>>43367815 #>>43367895 #>>43368796 #>>43371175 #>>43373293 #
13. josh-sematic ◴[] No.43366675[source]
I don’t buy Lecun’s argument. Once you get good RL going (as we are now seeing with reasoning models) you can give the model a reward function that rewards a correct answer most highly, an “I’m sorry but I don’t know” less highly than that, a wrong answer penalized, a confidently wrong answer more severely penalized. As the RL learns to maximize rewards I would think it would find the strategy of saying it doesn’t know in cases where it can’t find an answer it deems to have a high probability of correctness.
replies(1): >>43366765 #
14. littlestymaar ◴[] No.43366739{3}[source]
No an ML researcher or anything (I'm basically only a few Karpathy video into ML, so please someone correct me if I'm misunderstanding this), but it seems that you're getting this backwards:

> One misconception is that predicting the next word means there is no internal idea on the word after next. The simple disproof of this is that models put 'an' instead of 'a' ahead of words beginning with vowels.

My understanding is that there's simply not “'an' ahead of a word that starts with a vowel”, the model (or more accurately, the sampler) picks “an” and then the model will never predict a word that starts with a consonant after that. It's not like it “knows” in advance that it wants to put a word with a vowel and then anticipates that it needs to put “an”, it generates a probability for both tokens “a” and “an”, picks one, and then when it generates the following token, it will necessarily take its previous choice into account and never puts a word starting with a vowel after it has already chosen “a”.

replies(3): >>43367069 #>>43368302 #>>43377625 #
15. Tryk ◴[] No.43366765[source]
How do you define the "correct" answer?
replies(2): >>43366875 #>>43368708 #
16. thijson ◴[] No.43366830[source]
I watched an Andrej Karpathy video recently. He said that hallucination was because in the training data there were no examples where the answer is, "I don't know". Maybe I'm misinterpreting what he was saying though.

https://www.youtube.com/watch?v=7xTGNNLPyMI&t=4832s

17. TZubiri ◴[] No.43366868[source]
If multiple answers are equally likely, couldn't that be considered uncertainty? Conversely if there's only one answer and there's a huge leap to the second best, that's pretty certain.
18. jpadkins ◴[] No.43366875{3}[source]
obviously the truth is what is the most popular. /s
19. spmurrayzzz ◴[] No.43366902[source]
> i.e there isn't a "I don't have enough information" option.

This is true in terms of default mode for LLMs, but there's a fair amount of research dedicated to the idea of training models to signal when they need grounding.

SelfRAG is an interesting, early example of this [1]. The basic idea is that the model is trained to first decide whether retrieval/grounding is necessary and then, if so, after retrieval it outputs certain "reflection" tokens to decide whether a passage is relevant to answer a user query, whether the passage is supported (or requires further grounding), and whether the passage is useful. A score is calculated from the reflection tokens.

The model then critiques itself further by generating a tree of candidate responses, and scoring them using a weighted sum of the score and the log probabilities of the generated candidate tokens.

We can probably quibble about the loaded terms used here like "self-reflection", but the idea that models can be trained to know when they don't have enough information isn't pure fantasy today.

[1] https://arxiv.org/abs/2310.11511

EDIT: I should also note that I generally do side with Lecun's stance on this, but not due to the "not enough information" canard. I think models learning from abstraction (i.e. JEPA, energy-based models) rather than memorization is the better path forward.

20. unsupp0rted ◴[] No.43366901[source]
> The problem with LLMs is that the output is inherently stochastic

Isn't that true with humans too?

There's some leap humans make, even as stochastic parrots, that lets us generate new knowledge.

replies(1): >>43374566 #
21. yunwal ◴[] No.43367069{4}[source]
The model still has some representation of whether the word after an/a is more likely to start with a vowel or not when it outputs a/an. You can trivially understand this is true by asking LLMs to answer questions with only one correct answer.

"The animal most similar to a crocodile is:"

https://chatgpt.com/share/67d493c2-f28c-8010-82f7-0b60117ab2...

It will always say "an alligator". It chooses "an" because somewhere in the next word predictor it has already figured out that it wants to say alligator when it chooses "an".

If you ask the question the other way around, it will always answer "a crocodile" for the same reason.

replies(1): >>43367196 #
22. littlestymaar ◴[] No.43367196{5}[source]
Again, that's not a good example I think because everything about the answer is in the prompt, so obviously from the start the "alligator" is high, but then it's just waiting for an "an" to occur to have an occasion to put that.

That doesn't mean it knows "in advance" what it want to say, it's just that at every step the alligator is lurking in the logits because it directly derives from the prompt.

replies(1): >>43367750 #
23. Ericson2314 ◴[] No.43367515{6}[source]
Yeah everyone spending way to much money in things we barely understand is a recipe for insane path dependence.
24. ortsa ◴[] No.43367533{5}[source]
Oh yeah, that's exactly what I was thinking of! Seems like it would be very useful for expert models with domains with more definite "edges" (if I'm understanding it right)

As for the fragmentation of progress, I guess that's just par the course for any tech with a such a heavy private/open source split. It would take a huge amount of work to trawl through this constant stream of 'breakthroughs' and put them all together.

25. metaxz ◴[] No.43367750{6}[source]
You write: "it's just that at every step the alligator is lurking in the logits because it directly derives from the prompt" - but isn't that the whole point: at the moment the model writes "an", it isn't just spitting out a random article (or a 50/50 distribution of articles or other words for that matter); rather, "an" gets a high probability because the model internally knows that "alligator" is the correct thing after that. While it can only emit one token in this step, it will emit "an" to make it consistent with its alligator knowledge "lurking". And btw while not even directly relevant, the word alligator isn't in the prompt. Sure, it derives from the prompt but so does every an LLM generates, and same for any other AI mechanism for generating answers.
replies(1): >>43369344 #
26. metaxz ◴[] No.43367815{3}[source]
Thanks for writing this so clearly... I hear wrong/misguided arguments like we see hear every day from friends, colleagues, "experts in the media" etc.

It's strange because just a moment of thinking will show that such ideas are wrong or paint a clearly incomplete picture. And there's plenty of analogies to the dangers of such reductionism. It should be obviously wrong to anyone who has at least tried ChatGPT.

My only explanation is that a denial mechanism must be at play. It simply feels more comfortable to diminish LLM capabilities and/or feel that you understand them from reading a Medium article on transformer-network, than to consider the consequences in terms of the inner black-box nature.

27. ◴[] No.43367895{3}[source]
28. Lerc ◴[] No.43368302{4}[source]
yunwal has provided one example. Here's another using much smaller model.

https://chat.groq.com/?prompt=If+a+person+from+Ontario+or+To...

The response "If a person from Ontario or Toronto is a Canadian, a person from Sydney or Melbourne would be an Australian!"

It seems mighty unlikely that it chose Australian as the country because of the 'an', or that it chose to put the 'an' at that point in the sentence for any other reason that the word Australian was going to be next.

For any argument that you think that this does not mean that have some idea of what is to come, try and come up with a test to see if your hypothesis is true or not, then give that test a try.

29. itkovian_ ◴[] No.43368585[source]
>This is due to the fact that LLMs are basically just giant look up maps with interpolation.

This is obviously not true at this point except for the most loose definition of interpolation.

>don't rely on things like differentiability.

I've never heard lecun say we need to move away from gradient descent. The opposite actually.

30. throwawaymaths ◴[] No.43368625[source]
i dont think the stochasticity that's the problem -- the problem is that model gets "locked in" once it picks a token and there's no takesies backsies.

that also entails information destruction in the form of the logits table, but for the most part that should be accounted for in the last step before final feedforward

31. josh-sematic ◴[] No.43368708{3}[source]
Certainly not possible in all domains but equally certainly possible in some. There’s not much controversy about the height of the Eiffel Tower or how to concatenate two numpy arrays.
32. jkhdigital ◴[] No.43368796{3}[source]
I think your analogy about logic gates vs. CPUs is spot on. Another apt analogy would be missing the forest for the trees—the model may in fact be generating a complete forest, but its output (natural language) is inherently serial so it can only plant one tree at a time. The sequence of distributions that is the proximate driver of token selection is just the final distillation step.
33. littlestymaar ◴[] No.43369344{7}[source]
> While it can only emit one token in this step, it will emit "an" to make it consistent with its alligator knowledge "lurking".

It will also emit "a" from time to time without issue though, but will never spit "alligator" right after that, that's it.

> Sure, it derives from the prompt but so does every an LLM generates, and same for any other AI mechanism for generating answers.

Not really, because of the autoregressive nature of LLMs, the longer the response the more it will depend on its own response rather than the prompt. That's why you can see totally opposite response from LLM to the same query if you aren't asking basic factual questions. I saw a tool on reddit a few month ago that allowed you to see which words in the generation where the most “opinionated” (where the sampler had to chose between alternative words that were close in probability) and where it was easy to see that you could dramatically affect the result by just changing certain words.

> "an" gets a high probability because the model internally knows that "alligator" is the correct thing after that.

This is true, though it only works with this kind of prompt because the output of the LLM has little impact on the generation.

Globally I see what you mean, and I don't disagree with you, but at the same time, I think that saying that LLMs have a sense of anticipating the further token misses their ability to get driven astray by their own output: they have some information that will affect further tokens but any token that get spit can, and will, change that information in a way that can dramatically change the “plans”. And that's why I think using trivial questions isn't a good illustration, because it pushes this effect under the rug.

34. wavemode ◴[] No.43369776[source]
That's because, based on the training data, the most likely response to asking for directions is to clarify exactly where you are and what you see.

But if you ask it in terms of a knowledge test ("I'm at the corner of 1st and 2nd, what public park am I standing next to?") a model lacking web search capabilities will confidently hallucinate (unless it's a well-known park).

In fact, my person opinion is that, therein lies the most realistic way to reduce hallucination rates: rather than trying to train models to say "I don't know" (which is not really a trainable thing - models are fundamentally unaware of the limits of their own training data), instead just train them on which kinds of questions warrant a web search and which ones should be answered creatively.

replies(1): >>43370113 #
35. QuesnayJr ◴[] No.43370113{3}[source]
I tried this just now on Chatbot Arena, and both chatbots asked for more information.

One was GPT 4.5 preview, and one was cohort-chowder (which is someone's idea of a cute code name, I assume).

replies(1): >>43370276 #
36. wavemode ◴[] No.43370276{4}[source]
I tried this just now on Chatbot Arena, and both chatbots very confidently got the name of the park wrong.

Perhaps you thought I meant "1st and 2nd" literally? I was just using those as an example so I don't reveal where I live. You should use actual street names that are near a public park, and you can feel free to specify the city and state.

replies(1): >>43370359 #
37. QuesnayJr ◴[] No.43370359{5}[source]
I did think you meant it literally. Since I can't replicate the question you asked, I have no way of verifying your claim.
replies(1): >>43370600 #
38. QuesnayJr ◴[] No.43370823{7}[source]
Neither do I. Right after I read your reply I knew I had made a mistake engaging with you.
39. flamedoge ◴[] No.43371175{3}[source]
It literally doesn't know how to handle 'I don't know' and needs to be taught. fascinating.
replies(1): >>43371338 #
40. Lerc ◴[] No.43371338{4}[source]
I think it would be more accurate to say that after fine tuning on a series of questions with answers that it thinks that you don't want to hear "I don't know"
replies(1): >>43374524 #
41. cruffle_duffle ◴[] No.43373293{3}[source]
> It would be quite easy to detect (and exploit) behaviour that decided to use a vowel word just because it somewhat arbitrarily used an 'an'.

That is a very interesting observation!

Doesn’t that internal state get blown away and recreated for every “next token”? Isn’t the output always the previous context plus the new token, which gets fed back and out pops the new token? There is no transfer of internal state to the new iteration beyond what is “encoded” in its input tokens?

replies(1): >>43373930 #
42. Lerc ◴[] No.43373930{4}[source]
>Doesn’t that internal state get blown away and recreated for every “next token”

That is correct. When a model has a good idea of the next 5 words, after it has emitted the first of those 5 most architectures make no further use of the other 4 and regenerate likely the same information again in the next inference cycle.

There are architectures that don't discard all that information but the standard LLM has generally outperformed them, for now.

There are interesting philosophical implications if LLMs were to advance to a level to be considered sentient. Would it not be constantly creating and killing a thinking being for every token. On the other hand if context is considered memory, perhaps continuity of identity is based upon memory and all that other information are simply forgotten idle thoughts. We have no concept of what our previous thoughts were except from our memory. Is that not the same.

Sometimes I wonder if some of the resistance to AI is because it can do things that we think requires abilities that we would like to believe that we possess ourselves, and showing that they are not necessary creates the possibility that we might not have have those abilities.

There was a great observation recently in an interview (I forget the source, but the interviewer's last name was Bi) that some of the discoveries that met the most resistance in history such as the Earth orbiting the Sun, or Darwin's theory of evolution were similar in that they implied that we are not a unique special case.

43. kerkeslager ◴[] No.43374524{5}[source]
I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily.

If the training data contained a bunch of answers to questions which were simply "I don't know", you could get an LLM to say "I don't know" but that's still not actually a concept of not knowing. That's just knowing that the answer to your question is "I don't know".

It's essentially like if you had an HTTP server that responded to requests for nonexistent documents with a "200 OK" containing "Not found". It's fundamentally missing the "404 Not found" concept.

LLMs just have a bunch of words--they don't understand what the words mean. There's no metacognition going on for it to think "I don't know" for it to even think you would want to know that.

replies(1): >>43375507 #
44. borgdefenser ◴[] No.43374566[source]
I think it is because we don't feel the random and chaotic nature of what we know as individuals.

If I had been born a day earlier or later I would have a completely different life because of initial conditions and randomness but life doesn't feel that way even though I think this is obviously true.

45. Lerc ◴[] No.43375507{6}[source]
>I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily.

I'm not sure if this objection is terribly helpful. We use terms like think and want to describe processes that are clearly not involve any form of understanding. Electrons do not have motivations but they 'want' to go to a lower energy level in an atom. You can hold down the trigger for the fridge light to make it 'think' that the door has not been opened. These are uncontentious phrases that convey useful ideas.

I understand that when people are working towards producing reasoning machines the words might be working in similar spaces, but really when someone is making claims about machines having awareness, understanding, or thinking they make it quite clear about the context that they are talking about.

As to the rest of your comment, I simply disagree. If you think of a concept of an internal representation of a piece of information, then it has been shown that they do have such representations. In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know. This generalised to provide "I don't know" for examples that were not in the training data. For the fine tuning examples to succeed in that, it requires the model to already contain the concept.

I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.

I think that the models can express details about words shows that they do have detailed information about what each word means semantically. In many respects because of tokenisation indexing embeddings it would perhaps be more accurate to say that they have a better understanding of the semantic information of what words mean the what the words actually are. This is why they are poor at spelling but can give you detailed information about the thing they can't spell.

replies(1): >>43381331 #
46. numeri ◴[] No.43377625{4}[source]
No, the person you're responding to is absolutely right. The easy test (which has been done in papers again and again) is the ability to train linear probes (or non-linear classifier heads) on the current hidden representations to predict the nth-next token, and the fact that these probes have very high accuracy.
47. kerkeslager ◴[] No.43381331{7}[source]
> We use terms like think and want to describe processes that are clearly not involve any form of understanding.

...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.

> In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know.

This is why I included the HTTP example: this is simply telling it to parrot the phrase "I don't know"--it doesn't understand that it doesn't know. From the LLM's perpective, it "knows" that the answer is "I don't know". It's returning a 200 OK that says "I don't know" rather than returning a 404.

Do you understand the distinction I'm making here?

> I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.

The average (non-programmer) human, when asked to write a "Hello, world" program, can definitely say they don't know how to program. And unlike the LLM, the human knows that this is different from answering the question. The LLM, in contrast thinks it is answering the question when it says "I don't know"--it thinks "I don't know" is the correct answer.

Put another way, a human can distinguish between responses to these two questions, whereas an LLM can't:

1. What is my grandmother's maiden name?

2. What is the English translation of the Spanish phrase, "No sé."?

In the first question, you don't know the answer unless you are quite creepy; in the second case you do (or can find out easily). But the LLM tuned to answer I don't know thinks it knows the answer in both cases, and thinks the answer is the same.

replies(1): >>43385383 #
48. Lerc ◴[] No.43385383{8}[source]
>...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.

There is a difference between explanation by metaphor and lack of precision. If you think someone is implying something literal when they might be using a metaphor you can always ask for clarification. I know plenty of people that are utterly precise in their use in their language which leads them to being widely misunderstood because they think a weak precise signal is received as clearly as a strong imprecise signal. They usually think the failure in communication is in the recipient but in reality they are just accurately using the wrong protocol.

>Do you understand the distinction I'm making here? I believe I do, and it is precisely this distinction that the researches showed. By teaching a model to say "I don't know" for some information that they knew the model did not know the answer to, the model learned to respond "I don't know" for things that it did not know that it was not explicitly taught to respond with "I don't know". For it to acquire that ability to generalise to new cases the model has to have already had an internal representation of "That information is not available"

I'm not sure where you think a model converting its internal representation of not knowing something into words is distinct from a human converting its internal representation of not knowing into words.

When fine tuning directs a model to profess lack of knowledge, usually they will not give the same specific "I don't know" text as a way to express that it does not not know because they want the want to bind the concept "lack of knowledge" to the concept of "communicate that I do not know" rather than any particular word phrase. Giving it many ways to say "I don't know" builds that binding rather than the crude "if X then emit Y" that you imagine it to be.