https://arxiv.org/abs/2502.09992
https://www.inceptionlabs.ai/news
(these are results from two different teams/orgs)
It sounds kind of like what you're describing, and nobody else has mentioned it yet, so take a look and see whether it's relevant.
In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.
Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.
https://arxiv.org/abs/2502.09992
https://www.inceptionlabs.ai/news
(these are results from two different teams/orgs)
It sounds kind of like what you're describing, and nobody else has mentioned it yet, so take a look and see whether it's relevant.
[1] Which Inception Labs's new models may be based on; one of the cofounders is a co-author. See equations 18-20 in https://arxiv.org/abs/2310.16834