Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.
Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.
Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.
They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.
And the issue you mention in the last paragraph is very relevant, since the scenario is plausible, so it is something we definitely should be discussing.
Imagine if you were using single layer perceptrons without understanding seperability and going "just a few more tweaks and it will approximate XOR!"
Knowledge of backprop no matter how precise, and any convoluted 'theories' will not make you utilize LLMs any better. You'll be worse off if anything.
We don't even have a complete explanation of how we go from backprop to the emerging abilities we use and love, so who cares (for that purpose) how backprop works? It's not like we're actually using it to explain anything.
As I say in another comment, I often give talks to laypeople about LLMs and the mental model I present is something like supercharged Markov chain + massive training data + continuous vocabulary space + instruction tuning/RLHF. I think that provides the right abstraction level to reason about what LLMs can do and what their limitations are. It's irrelevant how the supercharged Markov chain works, in fact it's plausible that in the future one could replace backprop with some other learning algorithm and LLMs could still work in essentially the same way.
In the line of your first paragraph, probably many teens who had a lot of time in their hands when Bing Chat was released, and some critical spirit to not get misled by the VS, have better intuition about what an LLM can do than many ML experts.