(research.google)

149 points themgt | 1 comments | 07 Dec 25 14:47 UTC | HN request time: 0.195s | source

1. heavymemory ◴[08 Dec 25 12:03 UTC] No.46191281[source]▶

The idea is interesting, but I still don’t understand how this is supposed to solve continual learning in practice.

You’ve got a frozen transformer and a second module still trained with SGD, so how exactly does that solve forgetting instead of just relocating it?

Nested Learning: A new ML paradigm for continual learning