> We use terms like think and want to describe processes that are clearly not involve any form of understanding.
...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.
> In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know.
This is why I included the HTTP example: this is simply telling it to parrot the phrase "I don't know"--it doesn't understand that it doesn't know. From the LLM's perpective, it "knows" that the answer is "I don't know". It's returning a 200 OK that says "I don't know" rather than returning a 404.
Do you understand the distinction I'm making here?
> I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.
The average (non-programmer) human, when asked to write a "Hello, world" program, can definitely say they don't know how to program. And unlike the LLM, the human knows that this is different from answering the question. The LLM, in contrast thinks it is answering the question when it says "I don't know"--it thinks "I don't know" is the correct answer.
Put another way, a human can distinguish between responses to these two questions, whereas an LLM can't:
1. What is my grandmother's maiden name?
2. What is the English translation of the Spanish phrase, "No sé."?
In the first question, you don't know the answer unless you are quite creepy; in the second case you do (or can find out easily). But the LLM tuned to answer I don't know thinks it knows the answer in both cases, and thinks the answer is the same.