←back to thread

268 points prashp | 1 comments | | HN request time: 0.203s | source
Show context
treprinum ◴[] No.39216394[source]
The author might be right though what I've noticed with DL models is that the theory is often leading to underwhelming results after training and "bugs" in models sometimes lead to much better real-world performance, pointing out some disconnect between theory and what gradient-based optimization can achieve. One could see it also in the deep reinforcement learning where in theory the model should converge due to being Markovian via Banach fixed point but in practice the monstrous neural networks that estimate rewards can override this and change the character of the convergence.
replies(1): >>39217053 #
1. Sharlin ◴[] No.39217053[source]
One interesting example is how the OG algorithm for solving diff equations, the venerable and almost trivial Euler's method, in fact works very well with SD compared to many much newer, slower and fancier solvers. This likely has to do with the fact that we're dealing with an optimization problem rather than actually trying to find solutions to the diffusion DE.