There is also a related youtube video online: Ali Behrouz of Google Research explaining his poster paper entitled "Nested Learning: The Illusion of Deep Learning Architecture" at NeurIPS 2025. https://www.youtube.com/watch?v=uX12aCdni9Q
This still seems like gradient descent wrapped in new terminology. If all learning happens through weight updates, its just rearranging where the forgetting happens