(arxiv.org)

161 points belleville | 1 comments | 14 Apr 25 00:03 UTC | HN request time: 0.211s | source

Show context

gwern ◴[14 Apr 25 01:29 UTC] No.43677261[source]▶

https://www.reddit.com/r/MachineLearning/comments/1jsft3c/r_...

I'm still not quite sure how to think of this. Maybe as being like unrolling a diffusion model, the equivalent of BPTT for RNNs?

replies(2): >>43677696 #>>43684636 #

1. ActorNightly ◴[14 Apr 25 18:40 UTC] No.43684636[source]▶

>>43677261 #

I think we need to start thinking about one shot training. I.e instead of context into LLM, you should be able to tell it a fact, and it will encode that fact into the updated weights.

↑

NoProp: Training neural networks without back-propagation or forward-propagation