Backpropagation is a leaky abstraction (2016)

> Any leakiness in BackProp is addressed by researchers who introduce new optimizers

> As a developer, you just pick the best one and find good hparams for it

It would be more correct to say: "As a developer, (not researcher), whose main goal is to get a good model working — just pick a proven architecture, hyperparameters, and training loop for it."

Because just picking the best optimizer isn't enough. Some of the issues in the article come from the model design, e.g. sigmoids, relu, RNNs. And some of the issues need to be addressed in the training loop, e.g. gradient clipping isn't enabled by default in most DL frameworks.

And it should be noted that the article is addressing people on the academic / research side, who would benefit from a deeper understanding.