Check the pseudo code of their algorithms.
"Update using gradient based optimizations""
Check the pseudo code of their algorithms.
"Update using gradient based optimizations""
Maybe you have a way of seeing it differently so that this looks like a gradient? Gradient keys my brain into a desired outcome expressed as an expectation function.
The one that is not used, because it's inherently unstable?
Learning using locally accessible information is an interesting approach, but it needs to be more complex than "fire together, wire together". And then you might have propagation of information that allows to approximate gradients locally.
Is there anyone in particular whose work focuses on this that you know of?
I can't recall exactly what the Hebbian update is, but something tells me it minimises the "reconstruction loss", and effectively learns the PCA matrix.
It’s Hebbian and solves all stability problems.
GP is essentially isomorphic with beam search where the population is the beam. It is a fancy search algorithm. It is not "training" anything.
>"We believe this work takes a first step TOWARDS introducing a new family of GRADIENT-FREE learning methods"
I.e. for the time being, authors can't convince themselves not to take advantage of efficient hw for taking gradients
(*Checks that Oxford University is not under sanctions*)
There is no prediction or desired output, certainly explicit. I was playing with those things in my work to try and understand how our brains cause the emergence of intelligence rather than solve some classification or related problem. What I managed to replicate was the learning of XOR by some nodes and further that multidimensional XORs up to the number of inputs could be learned.
Perhaps you can say that PCAish is the implicit objective/result but I still reject that there is any conceptual notion of what a node "should" output even if iteratively applying the learning rule leads us there.
Gradient descent is only one way of searching for a minima, so in that sense it is not necessary, for example, when one can analytically solve for the extrema of the loss. As an alternative one could do Monte Carlo search instead of gradient descent. For a convex loss that would be less efficient of course.