Disclosure: I am the author of this paper.
Reference: (PDF) Hydra: Enhancing Machine Learning with a Multi-head Predictions Architecture. Available from: https://www.researchgate.net/publication/381009719_Hydra_Enh... [accessed Mar 14, 2025].
In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.
Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.
Disclosure: I am the author of this paper.
Reference: (PDF) Hydra: Enhancing Machine Learning with a Multi-head Predictions Architecture. Available from: https://www.researchgate.net/publication/381009719_Hydra_Enh... [accessed Mar 14, 2025].
As I discuss in the paper, predictive coding suggests that the brain actively generates predictions and compares them to incoming sensory data (vision, hearing, etc.), prioritizing anomalies. Its efficiency stems from a hierarchical memory system that continuously updates only the "deltas"—the differences that matter. Embracing this approach could lead to a paradigm shift, enabling the development of significantly more energy-efficient AI in the future.