Backpropagation is a leaky abstraction (2016)

(karpathy.medium.com)

346 points swatson741 | 1 comments | 02 Nov 25 05:20 UTC | HN request time: 0.245s | source

Show context

nirinor ◴[02 Nov 25 14:40 UTC] No.45790653[source]▶

Its a nit pick, but backpropagation is getting a bad rep here. These examples are about gradients+gradient descent variants being a leaky abstraction for optimization [1].

Backpropagation is a specific algorithm for computing gradients of composite functions, but even the failures that do come from composition (multiple sequential sigmoids cause exponential gradient decay) are not backpropagation specific: that's just how the gradients behave for that function, whatever algorithm you use. The remedy, of having people calculate their own backwards pass, is useful because people are _calculating their own derivatives_ for the functions, and get a chance to notice the exponents creeping in. Ask me how I know ;)

[1] Gradients being zero would not be a problem with a global optimization algorithm (which we don't use because they are impractical in high dimensions). Gradients getting very small might be dealt with by with tools like line search (if they are small in all directions) or approximate newton methods (if small in some directions but not others). Not saying those are better solutions in this context, just that optimization(+modeling) are the actually hard parts, not the way gradients are calculated.

replies(2): >>45791131 #>>45792438 #

xpe ◴[02 Nov 25 15:44 UTC] No.45791131[source]▶

>>45790653 #

Yes. No need to be apologetic or timid about it — it’s not a nit to push back against a flawed conceptual framing.

I respect Karpathy’s contributions to the field, but often I find his writing and speaking to be more than imprecise — it is sloppy in the sense that it overreaches and butchers key distinctions. This may sound harsh, but at his level, one is held to a higher standard.

replies(3): >>45791270 #>>45791604 #>>45791798 #

HarHarVeryFunny ◴[02 Nov 25 16:48 UTC] No.45791604[source]▶

>>45791131 #

Whoever chose this topic title perhaps did him a disservice in suggesting he said the problem was backprop itself, since in his blog post he immediately clarifies what he meant by it. It's a nice pithy way of stating the issue though.

replies(1): >>45791895 #

nirinor ◴[02 Nov 25 17:25 UTC] No.45791895[source]▶

>>45791604 #

Nah, Karpathy's title is "Yes you should understand backprop", and his first highlight is "The problem with Backpropagation is that it is a leaky abstraction." This is his choice as a communicator, not the poster to HN.

And his _examples_ are about gradients, but nowhere does he distinguish between backpropagation, a (part of) an algorithm for automatic differentiation and the gradients themselves. None of the issues are due to BP returning incorrect gradients (it totally could, for example, lose too much precision, but it doesn't).

replies(1): >>45791984 #

1. HarHarVeryFunny ◴[02 Nov 25 17:38 UTC] No.45791984[source]▶

>>45791895 #

Yeah - he chose it as a pithy/catchy description of the issue, then immediately clarified what he meant by it.

> In other words, it is easy to fall into the trap of abstracting away the learning process — believing that you can simply stack arbitrary layers together and backprop will “magically make them work” on your data.

Then follows this with multiple clear examples of exactly what he is talking about.

The target audience was people building and training neural networks (such as his CS231n students), so I think it's safe to assume they knew what backprop and gradients are, especially since he made them code gradients by hand, which is what they were complaining about!

↑