Backpropagation is a leaky abstraction (2016)

(karpathy.medium.com)

346 points swatson741 | 2 comments | 02 Nov 25 05:20 UTC | HN request time: 0s | source

Show context

gchadwick ◴[02 Nov 25 07:20 UTC] No.45788468[source]▶

Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

replies(2): >>45788631 #>>45788885 #

kubb ◴[02 Nov 25 09:03 UTC] No.45788885[source]▶

>>45788468 #

I was slightly surprised that my colleagues, who are extremely invested in capabilities of LLMs, didn’t show any interest in Karpathy’s communication on the subject when I recommended it to them.

Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.

They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.

replies(9): >>45788975 #>>45789023 #>>45789131 #>>45789241 #>>45789316 #>>45789676 #>>45789975 #>>45791483 #>>45791925 #

Al-Khwarizmi ◴[02 Nov 25 09:31 UTC] No.45789023[source]▶

>>45788885 #

What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works (sentence written by a human despite use of "delve"). Everyone should have some notions on what LLMs can or cannot do, in order to use them successfully and not be misguided by their limitations, but we don't need everyone to understand what backpropagation is, just as most of us use cars without knowing much about how an internal combustion engine works.

And the issue you mention in the last paragraph is very relevant, since the scenario is plausible, so it is something we definitely should be discussing.

replies(2): >>45789298 #>>45789446 #

Marazan ◴[02 Nov 25 10:33 UTC] No.45789298[source]▶

>>45789023 #

Because if you don't understand how a tool works you can't use the tool to it's full potential.

Imagine if you were using single layer perceptrons without understanding seperability and going "just a few more tweaks and it will approximate XOR!"

replies(4): >>45789385 #>>45789455 #>>45789613 #>>45795082 #

1. tarsinge ◴[02 Nov 25 11:46 UTC] No.45789613{3}[source]▶

>>45789298 #

I disagree in the case of LLMs, because they really are an accidental side effect of another tool. Not understanding the inner workings will make users attribute false properties to them. Once you understand how they work (how they generate plausible text), you get a far deeper grasp on their capabilities and how to tweak and prompt them.

And in fact this is true of any tool, you don’t have to know exactly how to build them but any craftsman has a good understanding how the tool works internally. LLMs are not a screw or a pen, they are more akin to an engine, you have to know their subtleties if you build a car. And even screws have to be understood structurally in advanced usage. Not understanding the tool is maybe true only for hobbyists.

replies(1): >>45795032 #

2. adi_kurian ◴[03 Nov 25 01:26 UTC] No.45795032[source]▶

>>45789613 (TP) #

Could you provide an example of an advanced prompt technique or approach that one would be much more likely to employ if they had knowledge of X internal working?

↑