←back to thread

385 points vessenes | 1 comments | | HN request time: 0s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context
eximius ◴[] No.43367519[source]
I believe that so long as weights are fixed at inference time, we'll be at a dead end.

Will Titans be sufficiently "neuroplastic" to escape that? Maybe, I'm not sure.

Ultimately, I think an architecture around "looping" where the model outputs are both some form of "self update" and "optional actionality" such that interacting with the model is more "sampling from a thought space" will be required.

replies(3): >>43367644 #>>43370757 #>>43372112 #
mft_ ◴[] No.43367644[source]
Very much this. I’ve been wondering why I’ve not seen it much discussed.
replies(2): >>43368224 #>>43369295 #
jononor ◴[] No.43368224[source]
There are many roadblocks to continual learning still. Most current models and training paradigms are very vulnerable to catastrophic forgetting. And are very sample inefficient. And we/the methods are not so good at separating what is "interesting" (should be learned) vs "not". But this is being researched, for example under the topic of open ended learning, active inference, etc.
replies(1): >>43372137 #
chriskanan ◴[] No.43372137[source]
As a leader in the field of continual learning, I somewhat agree, but I'd say that catastrophic forgetting is largely resolved. The problem is that the continual learning community largely has become insular and is mostly focusing on toy problems that don't matter, where they will even avoid good solutions for nonsensical reasons. For example, reactivation / replay / rehearsal works well for mitigating catastrophic forgetting almost entirely, but a lot of the continual learning community mostly dislikes it because it is very effective. A lot of the work is focusing on toy problems and they refuse to scale up. I wrote this paper with some of my colleagues on this issue, although with such a long author list it isn't as focused as I would have liked in terms of telling the continual learning community to get out of its rut such that they are writing papers that advance AI rather than are just written for other continual learning researchers: https://arxiv.org/abs/2311.11908

The majority are focusing on the wrong paradigms and the wrong questions, which blocks progress towards the kinds of continual learning needed to make progress towards creating models that think in latent space and enabling meta-cognition, which would then give architectures the ability to avoid hallucinations by knowing what they don't know.

replies(2): >>43372436 #>>43374007 #
1. jononor ◴[] No.43374007[source]
Thanks a lot for this paper and the ones you shared deeper in the thread!