(arxiv.org)

161 points belleville | 2 comments | 14 Apr 25 00:03 UTC | HN request time: 0.654s | source

Show context

itsthecourier ◴[14 Apr 25 03:01 UTC] No.43677688[source]▶

>>43676837 (OP) #

"Whenever these kind of papers come out I skim it looking for where they actually do backprop.

Check the pseudo code of their algorithms.

"Update using gradient based optimizations""

replies(4): >>43677717 #>>43677878 #>>43684074 #>>43725019 #

f_devd ◴[14 Apr 25 03:47 UTC] No.43677878[source]▶

>>43677688 #

I mean the only claim is no propagation, you always need a gradient of sorts to update parameters. Unless you just stumble upon the desired parameters. Even genetic algorithms effectively has gradients which are obfuscated through random projections.

replies(3): >>43678034 #>>43679597 #>>43679675 #

1. bob1029 ◴[14 Apr 25 09:39 UTC] No.43679597[source]▶

>>43677878 #

In genetic algorithms, any gradient found would be implied by way of the fitness function and would not be something to inherently pursue. There are no free lunches like with chain rule of calculus.

GP is essentially isomorphic with beam search where the population is the beam. It is a fancy search algorithm. It is not "training" anything.

replies(1): >>43679880 #

2. f_devd ◴[14 Apr 25 10:41 UTC] No.43679880[source]▶

>>43679597 (TP) #

True, genetic algorithms are only implied, but those implied gradients are used in the more successful evolutionary strategies. So while they might not look like it (because it's not used in a continuous descent) they still very much work like (although they represent a smoother function than) regular back-prop gradients when aggregated.

↑

NoProp: Training neural networks without back-propagation or forward-propagation