In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.
Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.
We on the other hand are shaped by billions of years of genetic evolution, and 200k years of cultural evolution. If you count the total number of words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime, it comes out to 10 million times the size of GPT-4's training set.
So we spent 10 million more words discovering than it takes the transformer to catch up. GPT-4 used 10 thousand people's worth of language to catch up all that evolutionary finetuning.
Well yes, actually.
This assumption is slightly wrong direction, because not exist human who could consume much more than about 1B words during their lifetime. So humanity could not gain enhancement from just multiply words of one human by 100 billion. I think, correct estimation could be 1B words multiply by 100.
I think, current AI already achieved size need to become AGI, but to finish, probably need to change structure (but I'm not sure about this), and also need some additional multidimensional dataset, not just texts.
I might bet on 3D cinema, and/or on automobile targeting autopilot dataset, or something for real life humanoid robots solving typical human tasks, like fold shirt.