Backpropagation is a leaky abstraction (2016)

(karpathy.medium.com)

Show context

gchadwick ◴[02 Nov 25 07:20 UTC] No.45788468[source]▶

Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

replies(2): >>45788631 #>>45788885 #

kubb ◴[02 Nov 25 09:03 UTC] No.45788885[source]▶

>>45788468 #

I was slightly surprised that my colleagues, who are extremely invested in capabilities of LLMs, didn’t show any interest in Karpathy’s communication on the subject when I recommended it to them.

Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.

They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.

replies(9): >>45788975 #>>45789023 #>>45789131 #>>45789241 #>>45789316 #>>45789676 #>>45789975 #>>45791483 #>>45791925 #

Al-Khwarizmi ◴[02 Nov 25 09:31 UTC] No.45789023[source]▶

>>45788885 #

What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works (sentence written by a human despite use of "delve"). Everyone should have some notions on what LLMs can or cannot do, in order to use them successfully and not be misguided by their limitations, but we don't need everyone to understand what backpropagation is, just as most of us use cars without knowing much about how an internal combustion engine works.

And the issue you mention in the last paragraph is very relevant, since the scenario is plausible, so it is something we definitely should be discussing.

replies(2): >>45789298 #>>45789446 #

1. Marazan ◴[02 Nov 25 10:33 UTC] No.45789298[source]▶

>>45789023 #

Because if you don't understand how a tool works you can't use the tool to it's full potential.

Imagine if you were using single layer perceptrons without understanding seperability and going "just a few more tweaks and it will approximate XOR!"

replies(4): >>45789385 #>>45789455 #>>45789613 #>>45795082 #

2. kubb ◴[02 Nov 25 10:53 UTC] No.45789385[source]▶

>>45789298 (TP) #

You hit the nail on the head, in my opinion.

There are things that you just can’t expect from current LLMs that people routinely expect from them.

They start out projects with those expectations. And that’s fine. But they don’t always learn from the outcomes of those projects.

3. Al-Khwarizmi ◴[02 Nov 25 11:08 UTC] No.45789455[source]▶

>>45789298 (TP) #

I don't think that's a good analogy, becuase if you're trying to train a single layer perceptron to approximate XOR you're not the end user.

replies(2): >>45789616 #>>45791832 #

4. tarsinge ◴[02 Nov 25 11:46 UTC] No.45789613[source]▶

>>45789298 (TP) #

I disagree in the case of LLMs, because they really are an accidental side effect of another tool. Not understanding the inner workings will make users attribute false properties to them. Once you understand how they work (how they generate plausible text), you get a far deeper grasp on their capabilities and how to tweak and prompt them.

And in fact this is true of any tool, you don’t have to know exactly how to build them but any craftsman has a good understanding how the tool works internally. LLMs are not a screw or a pen, they are more akin to an engine, you have to know their subtleties if you build a car. And even screws have to be understood structurally in advanced usage. Not understanding the tool is maybe true only for hobbyists.

replies(1): >>45795032 #

5. Marazan ◴[02 Nov 25 11:47 UTC] No.45789616[source]▶

>>45789455 #

The analogy is if you don't understand the limitations of the tool you may try and make it do something it is bad at and never understand why it will never do the thing you want despite looking like it potentially coild

6. vajrabum ◴[02 Nov 25 17:18 UTC] No.45791832[source]▶

>>45789455 #

None of this is about an end user in the sense of the user of an LLM. This is aimed at the prospective user of a training framework which implements backpropagation at a high level of abstraction. As such it draws attention to training problems which arise inside the black box in order to motivate learning what is inside that box. There aren't any ML engineers who shouldn't know all about single layer perceptrons I think, and that makes for a nice analogy to real life issues in using SGD and backpropagation for ML training.

replies(1): >>45796542 #

7. adi_kurian ◴[03 Nov 25 01:26 UTC] No.45795032[source]▶

>>45789613 #

Could you provide an example of an advanced prompt technique or approach that one would be much more likely to employ if they had knowledge of X internal working?

8. og_kalu ◴[03 Nov 25 01:38 UTC] No.45795082[source]▶

>>45789298 (TP) #

If you want a good idea of how well LLMs will work for your use case then use them. Use them in different ways, for different things.

Knowledge of backprop no matter how precise, and any convoluted 'theories' will not make you utilize LLMs any better. You'll be worse off if anything.

replies(1): >>45796577 #

9. Al-Khwarizmi ◴[03 Nov 25 07:04 UTC] No.45796542{3}[source]▶

>>45791832 #

The post I was replying to was about "colleagues, who are extremely invested in capabilities of LLMs" and then mentions how they are uninterested in how they work and just interested in what they can do and societal implications.

It sounds to me very much like end users, not people who are training LLMs.

10. Al-Khwarizmi ◴[03 Nov 25 07:11 UTC] No.45796577[source]▶

>>45795082 #

Yeah, that's what I'm trying to explain (maybe unsuccessfully). I do know backprop, I studied and used it back in the early 00s when it was very much not cool. But I don't think that knowledge is especially useful to use LLMs.

We don't even have a complete explanation of how we go from backprop to the emerging abilities we use and love, so who cares (for that purpose) how backprop works? It's not like we're actually using it to explain anything.

As I say in another comment, I often give talks to laypeople about LLMs and the mental model I present is something like supercharged Markov chain + massive training data + continuous vocabulary space + instruction tuning/RLHF. I think that provides the right abstraction level to reason about what LLMs can do and what their limitations are. It's irrelevant how the supercharged Markov chain works, in fact it's plausible that in the future one could replace backprop with some other learning algorithm and LLMs could still work in essentially the same way.

In the line of your first paragraph, probably many teens who had a lot of time in their hands when Bing Chat was released, and some critical spirit to not get misled by the VS, have better intuition about what an LLM can do than many ML experts.

↑