Backpropagation is a leaky abstraction (2016)

(karpathy.medium.com)

346 points swatson741 | 1 comments | 02 Nov 25 05:20 UTC | HN request time: 0.2s | source

Show context

gchadwick ◴[02 Nov 25 07:20 UTC] No.45788468[source]▶

Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

replies(2): >>45788631 #>>45788885 #

kubb ◴[02 Nov 25 09:03 UTC] No.45788885[source]▶

>>45788468 #

I was slightly surprised that my colleagues, who are extremely invested in capabilities of LLMs, didn’t show any interest in Karpathy’s communication on the subject when I recommended it to them.

Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.

They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.

replies(9): >>45788975 #>>45789023 #>>45789131 #>>45789241 #>>45789316 #>>45789676 #>>45789975 #>>45791483 #>>45791925 #

Al-Khwarizmi ◴[02 Nov 25 09:31 UTC] No.45789023[source]▶

>>45788885 #

What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works (sentence written by a human despite use of "delve"). Everyone should have some notions on what LLMs can or cannot do, in order to use them successfully and not be misguided by their limitations, but we don't need everyone to understand what backpropagation is, just as most of us use cars without knowing much about how an internal combustion engine works.

And the issue you mention in the last paragraph is very relevant, since the scenario is plausible, so it is something we definitely should be discussing.

replies(2): >>45789298 #>>45789446 #

Marazan ◴[02 Nov 25 10:33 UTC] No.45789298[source]▶

>>45789023 #

Because if you don't understand how a tool works you can't use the tool to it's full potential.

Imagine if you were using single layer perceptrons without understanding seperability and going "just a few more tweaks and it will approximate XOR!"

replies(4): >>45789385 #>>45789455 #>>45789613 #>>45795082 #

Al-Khwarizmi ◴[02 Nov 25 11:08 UTC] No.45789455[source]▶

>>45789298 #

I don't think that's a good analogy, becuase if you're trying to train a single layer perceptron to approximate XOR you're not the end user.

replies(2): >>45789616 #>>45791832 #

1. Marazan ◴[02 Nov 25 11:47 UTC] No.45789616[source]▶

>>45789455 #

The analogy is if you don't understand the limitations of the tool you may try and make it do something it is bad at and never understand why it will never do the thing you want despite looking like it potentially coild

↑