←back to thread

385 points vessenes | 1 comments | | HN request time: 0.231s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

1. AlexCoventry ◴[] No.43369202[source]
Not an insider, but:

I don't know about you, but I certainly don't generate text autoregressively, token by token. Also, pretty sure I don't learn by global updates based on taking the derivative of some objective function of my behavior with respect to every parameter defining my brain. So there's good biological reason to think we can go beyond the capabilities of current architectures.

I think probably an example of the kind of new architectures he supports is FB's Large Concept Models [1]. It's still a self-attention, autoregressive architecture, but the unit of regression is a sentence rather than a token. It maps sentences into a latent space via an autoencoder architecture, then has a transformer architecture in which the tokens are elements in that latent space.

[1] https://arxiv.org/abs/2412.08821