←back to thread

161 points belleville | 1 comments | | HN request time: 0.204s | source
Show context
itsthecourier ◴[] No.43677688[source]
"Whenever these kind of papers come out I skim it looking for where they actually do backprop.

Check the pseudo code of their algorithms.

"Update using gradient based optimizations""

replies(4): >>43677717 #>>43677878 #>>43684074 #>>43725019 #
f_devd ◴[] No.43677878[source]
I mean the only claim is no propagation, you always need a gradient of sorts to update parameters. Unless you just stumble upon the desired parameters. Even genetic algorithms effectively has gradients which are obfuscated through random projections.
replies(3): >>43678034 #>>43679597 #>>43679675 #
erikerikson ◴[] No.43678034[source]
No you don't. See Hebbian learning (neurons that fire together wire together). Bonus: it is one of the biologically plausible options.

Maybe you have a way of seeing it differently so that this looks like a gradient? Gradient keys my brain into a desired outcome expressed as an expectation function.

replies(4): >>43678091 #>>43679021 #>>43680033 #>>43683591 #
1. srean ◴[] No.43683591[source]
Nope that update with the rank one update is exactly the projected gradient of the reconstruction loss. That's not the way it is usually taught. So Hebbian learning was an unfortunate example.

Gradient descent is only one way of searching for a minima, so in that sense it is not necessary, for example, when one can analytically solve for the extrema of the loss. As an alternative one could do Monte Carlo search instead of gradient descent. For a convex loss that would be less efficient of course.