Ask HN: Any insider takes on Yann LeCun's push against current architectures?

385 points vessenes | 2 comments | 10 Mar 25 19:41 UTC | HN request time: 0.677s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

bobosha ◴[14 Mar 25 20:39 UTC] No.43367047[source]▶

>>43325049 (OP) #

I argue that JEPA and its Energy-Based Model (EBM) framework fail to capture the deeply intertwined nature of learning and prediction in the human brain—the “yin and yang” of intelligence. Contemporary machine learning approaches remain heavily reliant on resource-intensive, front-loaded training phases. I advocate for a paradigm shift toward seamlessly integrating training and prediction, aligning with the principles of online learning.

Disclosure: I am the author of this paper.

Reference: (PDF) Hydra: Enhancing Machine Learning with a Multi-head Predictions Architecture. Available from: https://www.researchgate.net/publication/381009719_Hydra_Enh... [accessed Mar 14, 2025].

replies(3): >>43367244 #>>43367312 #>>43367329 #

1. vessenes ◴[14 Mar 25 21:08 UTC] No.43367329[source]▶

>>43367047 #

Update: Interesting paper, thanks. Comment on selection for Hydra — you mention v1 uses an arithmetic mean across timescales for prediction. Taking this analogy of the longer windows encapsulating different timescales, I’d propose it would be interesting to train a layer to predict weighting of the timescale predictions. Essentially — is this a moment where I need to focus on what just happened, or is this a moment in which my long range predictions are more important?

replies(1): >>43371833 #

2. bobosha ◴[15 Mar 25 11:31 UTC] No.43371833[source]▶

>>43367329 (TP) #

Ty for reading the paper! I completely agree! Assigning soft weights to the window based on context is a fascinating research area. This concept is similar to Ebbinghaus' forgetting curve, which emphasizes recency bias while requiring repeated exposure for long-term retention.

↑