Ask HN: Any insider takes on Yann LeCun's push against current architectures?

385 points vessenes | 1 comments | 10 Mar 25 19:41 UTC | HN request time: 0.282s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

eximius ◴[14 Mar 25 21:31 UTC] No.43367519[source]▶

>>43325049 (OP) #

I believe that so long as weights are fixed at inference time, we'll be at a dead end.

Will Titans be sufficiently "neuroplastic" to escape that? Maybe, I'm not sure.

Ultimately, I think an architecture around "looping" where the model outputs are both some form of "self update" and "optional actionality" such that interacting with the model is more "sampling from a thought space" will be required.

replies(3): >>43367644 #>>43370757 #>>43372112 #

mft_ ◴[14 Mar 25 21:47 UTC] No.43367644[source]▶

>>43367519 #

Very much this. I’ve been wondering why I’ve not seen it much discussed.

replies(2): >>43368224 #>>43369295 #

jononor ◴[14 Mar 25 23:01 UTC] No.43368224[source]▶

>>43367644 #

There are many roadblocks to continual learning still. Most current models and training paradigms are very vulnerable to catastrophic forgetting. And are very sample inefficient. And we/the methods are not so good at separating what is "interesting" (should be learned) vs "not". But this is being researched, for example under the topic of open ended learning, active inference, etc.

replies(1): >>43372137 #

chriskanan ◴[15 Mar 25 12:36 UTC] No.43372137[source]▶

>>43368224 #

As a leader in the field of continual learning, I somewhat agree, but I'd say that catastrophic forgetting is largely resolved. The problem is that the continual learning community largely has become insular and is mostly focusing on toy problems that don't matter, where they will even avoid good solutions for nonsensical reasons. For example, reactivation / replay / rehearsal works well for mitigating catastrophic forgetting almost entirely, but a lot of the continual learning community mostly dislikes it because it is very effective. A lot of the work is focusing on toy problems and they refuse to scale up. I wrote this paper with some of my colleagues on this issue, although with such a long author list it isn't as focused as I would have liked in terms of telling the continual learning community to get out of its rut such that they are writing papers that advance AI rather than are just written for other continual learning researchers: https://arxiv.org/abs/2311.11908

The majority are focusing on the wrong paradigms and the wrong questions, which blocks progress towards the kinds of continual learning needed to make progress towards creating models that think in latent space and enabling meta-cognition, which would then give architectures the ability to avoid hallucinations by knowing what they don't know.

replies(2): >>43372436 #>>43374007 #

Nimitz14 ◴[15 Mar 25 13:33 UTC] No.43372436[source]▶

>>43372137 #

Any continual learning papers you're a fan of?

replies(1): >>43372780 #

chriskanan ◴[15 Mar 25 14:31 UTC] No.43372780[source]▶

>>43372436 #

Depends on what angle you are interested in. If you are interested in continual learning for something like mitigating model drift such that a model can stay up-to-date where the goal is attain speed ups during training see these works:

Compared to other methods for continual learning on ImageNet-1K, SIESTA requires 7x-60x less compute than other methods and achieves the same performance as a model trained in an offline/batch manner. It also works for arbitrary distributions rather than a lot of continual learning methods that only work for specific distributions (and hence don't really match any real-world use case): https://yousuf907.github.io/siestasite/

In this one we focused on mitigating the drop in performance when a system encounters a new distribution. This resulted in a 16x speed up or so: https://yousuf907.github.io/sgmsite/

In this one, we show how the strategy for creating multi-modal LLMs like LLaVA is identical to a two-task continual learning system and we note that many LLMs once they become multi-modal forget a large amount of the capabilities of the original LLM. We demonstrate that continual learning methods can mitigate that drop in accuracy enabling the multi-modal task to be learned while not impairing uni-modal performance: https://arxiv.org/abs/2410.19925 [We have a couple approaches that are better now that will be out in the next few months]

It really depends on what you are interested in. For production AI, the real need is computational efficiency and keeping strong models up-to-date. Not many labs besides mine are focusing on that.

Currently, I'm focused on continual learning for creating systems beyond LLMs that incrementally learn meta-cognition and working on continual learning to explain memory consolidation works in mammals and why we have REM phases during sleep, but that's more of a cognitive science contribution so the constraints on the algorithms differ since the goal differs.

replies(1): >>43375374 #

1. mft_ ◴[15 Mar 25 21:49 UTC] No.43375374[source]▶

>>43372780 #

> working on continual learning to explain memory consolidation <how> works in mammals and why we have REM phases during sleep

That's a nice model: human short-term memory is akin to the context window, and REM sleep consolidating longer-term memories is akin to updating the model itself.

How difficult would it be to perform limited focused re-training based on what's been learnt (e.g. new information, new connections, corrections of errors, etc.) within a context window?

↑