"Whenever these kind of papers come out I skim it looking for where they actually do backprop.
Check the pseudo code of their algorithms.
"Update using gradient based optimizations""
replies(4):
Check the pseudo code of their algorithms.
"Update using gradient based optimizations""
Maybe you have a way of seeing it differently so that this looks like a gradient? Gradient keys my brain into a desired outcome expressed as an expectation function.
Gradient descent is only one way of searching for a minima, so in that sense it is not necessary, for example, when one can analytically solve for the extrema of the loss. As an alternative one could do Monte Carlo search instead of gradient descent. For a convex loss that would be less efficient of course.