←back to thread

385 points vessenes | 5 comments | | HN request time: 0.78s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

1. bashfulpup ◴[] No.43367802[source]
He's right but at the same time wrong. Current AI methods are essentially scaled up methods that we learned decades ago.

These long horizon (agi) problems have been there since the very beginning. We have never had a solution to them. RL assumes we know the future which is a poor proxy. These energy based methods fundamentally do very little that an RNN didn't do long ago.

I worked on higher dimensionality methods which is a very different angle. My take is that it's about the way we scale dependencies between connections. The human brain makes and breaks a massive amount of nueron connections daily. Scaling the dimensionality would imply that a single connection could be scalled to encompass significantly more "thoughts" over time.

Additionally the true to solution to these problems are likely to be solved by a kid with a laptop as much as an top researcher. You find the solution to CL on a small AI model (mnist) you solve it at all scales.

replies(2): >>43367830 #>>43368066 #
2. nradov ◴[] No.43367830[source]
For a kid with a laptop to solve it would require the problem to be solvable with current standard hardware. There's no evidence for that. We might need a completely different hardware paradigm.
replies(1): >>43368179 #
3. haolez ◴[] No.43368066[source]
Not exactly related, but I wonder sometimes if the fact that the weights in current models are very expansive to change is a feature and not a "bug".

Somehow, it feels harder to trust a model that could evolve over time. It's performance might even degrade. That's a steep price to pay for having memory built in and a (possibly) self-evolving model.

replies(1): >>43368190 #
4. bashfulpup ◴[] No.43368179[source]
Also possible and a fair point. My point is that it's a "tiny" solution that we can scale.

I could revise that by saying a kid with a whiteboard.

It's an einstein×10 moment so who know when that'll happen.

5. bashfulpup ◴[] No.43368190[source]
We degrade, and I think we are far more valuable than one model.