Diving through the abstraction reveals some of those.
You've also missed the point of the article, if you're building novel model architectures you can't magic away the leakiness. You need to understand the back prop behaviours of the building blocks you use to achieve a good training run. Ignore these and what could be a good model architecture with some tweaks will either entirely fail to train or produce disappointing results.
Perhaps you're working at a level of bolting pre built models together or training existing architectures on new datasets but this course operates below that level to teach you how things actually work.
> As a developer, you just pick the best one and find good hparams for it
It would be more correct to say: "As a developer, (not researcher), whose main goal is to get a good model working — just pick a proven architecture, hyperparameters, and training loop for it."
Because just picking the best optimizer isn't enough. Some of the issues in the article come from the model design, e.g. sigmoids, relu, RNNs. And some of the issues need to be addressed in the training loop, e.g. gradient clipping isn't enabled by default in most DL frameworks.
And it should be noted that the article is addressing people on the academic / research side, who would benefit from a deeper understanding.
Just because the framework you are using provides things like ReLU doesn't mean you can assume someone else has done all the work and you can just use these and expect them to work all the time. When things go wrong training a neural net you need to know where to look, and what to look for - things like exploding and vanishing gradients.