(arxiv.org)

161 points belleville | 1 comments | 14 Apr 25 00:03 UTC | HN request time: 0.203s | source

Show context

gwern ◴[14 Apr 25 01:29 UTC] No.43677261[source]▶

https://www.reddit.com/r/MachineLearning/comments/1jsft3c/r_...

I'm still not quite sure how to think of this. Maybe as being like unrolling a diffusion model, the equivalent of BPTT for RNNs?

replies(2): >>43677696 #>>43684636 #

cttet ◴[14 Apr 25 03:03 UTC] No.43677696[source]▶

>>43677261 #

In all their experiments, backprop is used for most of their parameter though...

replies(1): >>43678281 #

1. hansvm ◴[14 Apr 25 05:14 UTC] No.43678281[source]▶

>>43677696 #

There is a meaningful distinction. They only use backprop one layer at a time, requiring additional space proportional to that layer. Full backprop requires additional space proportional to the whole network.

It's also a bit interesting as an experimental result, since the core idea didn't require backprop. Being an implementation detail, you could theoretically swap in other layer types or solvers.

↑

NoProp: Training neural networks without back-propagation or forward-propagation