←back to thread

385 points vessenes | 6 comments | | HN request time: 0.978s | source | bottom

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

1. tyronehed ◴[] No.43365788[source]
Any transformer based LLM will never achieve AGI because it's only trying to pick the next word. You need a larger amount of planning to achieve AGI. Also, the characteristics of LLMs do not resemble any existing intelligence that we know of. Does a baby require 2 years of statistical analysis to become useful? No. Transformer architectures are parlor tricks. They are glorified Google but they're not doing anything or planning. If you want that, then you have to base your architecture on the known examples of intelligence that we are aware of in the universe. And that's not a transformer. In fact, whatever AGI emerges will absolutely not contain a transformer.
replies(3): >>43366660 #>>43366893 #>>43366959 #
2. flawn ◴[] No.43366660[source]
It's not about just picking the next word here, that doesn't at all refuse whether Transformers can achieve AGI. Words are just one representation of information. And whether it resembles any intelligence we know is also not an argument because there is no reason to believe that all intelligence is based on anything we've seen (e.g us, or other animals). The underlying architecture of Attention & MLPs can surely still depict something which we could call an AGI, and in certain tasks it surely can be considered an AGI already. I also don't know for certain whether we will hit any roadblocks or architectural asymptotes but I haven't come across any well-founded argument that Transformers definitely could not reach AGI.
3. visarga ◴[] No.43366893[source]
The transformer is a simple and general architecture. Being such a flexible model, it needs to learn "priors" from data, it makes few assumptions on its distribution from the start. The same architecture can predict protein folding and fluid dynamics. It's not specific to language.

We on the other hand are shaped by billions of years of genetic evolution, and 200k years of cultural evolution. If you count the total number of words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime, it comes out to 10 million times the size of GPT-4's training set.

So we spent 10 million more words discovering than it takes the transformer to catch up. GPT-4 used 10 thousand people's worth of language to catch up all that evolutionary finetuning.

replies(1): >>43367897 #
4. unsupp0rted ◴[] No.43366959[source]
> Does a baby require 2 years of statistical analysis to become useful?

Well yes, actually.

replies(1): >>43369075 #
5. simne ◴[] No.43367897[source]
> words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime..comes out to 10 million times the size of GPT-4's training set

This assumption is slightly wrong direction, because not exist human who could consume much more than about 1B words during their lifetime. So humanity could not gain enhancement from just multiply words of one human by 100 billion. I think, correct estimation could be 1B words multiply by 100.

I think, current AI already achieved size need to become AGI, but to finish, probably need to change structure (but I'm not sure about this), and also need some additional multidimensional dataset, not just texts.

I might bet on 3D cinema, and/or on automobile targeting autopilot dataset, or something for real life humanoid robots solving typical human tasks, like fold shirt.

6. nsonha ◴[] No.43369075[source]
of the entire human race's knowledge, and it's like from written history, not 2 years ago.