Backpropagation is a leaky abstraction (2016)

1. jamesblonde ◴[02 Nov 25 08:20 UTC] No.45788699[source]▶

I have to be contrarian here. The students were right. You didn't need to learn to implement backprop in NumPy. Any leakiness in BackProp is addressed by researchers who introduce new optimizers. As a developer, you just pick the best one and find good hparams for it.

replies(5): >>45788770 #>>45788820 #>>45788864 #>>45788882 #>>45790922 #

2. _diyar ◴[02 Nov 25 08:36 UTC] No.45788770[source]▶

>>45788699 (TP) #

From the perspective of the university, the students are being trained to become researchers, not engineers.

3. PeterStuer ◴[02 Nov 25 08:48 UTC] No.45788820[source]▶

>>45788699 (TP) #

The problem with your reasoning is you never tackle your "unknown unknowns". You just assume they are "known unknowns".

Diving through the abstraction reveals some of those.

4. gchadwick ◴[02 Nov 25 08:58 UTC] No.45788864[source]▶

>>45788699 (TP) #

It's for a CS course at Stanford not a PyTorch boot camp. It seems reasonable to expect some level of academic rigour and need to learn and demonstrate understanding of the fundamentals. If researchers aren't learning the fundamentals in courses like these where are they learning them?

You've also missed the point of the article, if you're building novel model architectures you can't magic away the leakiness. You need to understand the back prop behaviours of the building blocks you use to achieve a good training run. Ignore these and what could be a good model architecture with some tweaks will either entirely fail to train or produce disappointing results.

Perhaps you're working at a level of bolting pre built models together or training existing architectures on new datasets but this course operates below that level to teach you how things actually work.

5. froobius ◴[02 Nov 25 09:01 UTC] No.45788882[source]▶

>>45788699 (TP) #

> Any leakiness in BackProp is addressed by researchers who introduce new optimizers

> As a developer, you just pick the best one and find good hparams for it

It would be more correct to say: "As a developer, (not researcher), whose main goal is to get a good model working — just pick a proven architecture, hyperparameters, and training loop for it."

Because just picking the best optimizer isn't enough. Some of the issues in the article come from the model design, e.g. sigmoids, relu, RNNs. And some of the issues need to be addressed in the training loop, e.g. gradient clipping isn't enabled by default in most DL frameworks.

And it should be noted that the article is addressing people on the academic / research side, who would benefit from a deeper understanding.

6. HarHarVeryFunny ◴[02 Nov 25 15:18 UTC] No.45790922[source]▶

>>45788699 (TP) #

The problem isn't with backprop itself or the optimizer - it's potentially in (the dervatives of) the functions you are building the neural net out of, such as the Sigmoid and ReLU examples that Karpathy gave.

Just because the framework you are using provides things like ReLU doesn't mean you can assume someone else has done all the work and you can just use these and expect them to work all the time. When things go wrong training a neural net you need to know where to look, and what to look for - things like exploding and vanishing gradients.