The author might be right though what I've noticed with DL models is that the theory is often leading to underwhelming results after training and "bugs" in models sometimes lead to much better real-world performance, pointing out some disconnect between theory and what gradient-based optimization can achieve. One could see it also in the deep reinforcement learning where in theory the model should converge due to being Markovian via Banach fixed point but in practice the monstrous neural networks that estimate rewards can override this and change the character of the convergence.
replies(1):