The short answer should be that it's obvious LLM training and inference are both ridiculously inefficient and biologically implausible, and therefore there has to be some big optimization wins still on the table.
In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.
Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.
The short answer should be that it's obvious LLM training and inference are both ridiculously inefficient and biologically implausible, and therefore there has to be some big optimization wins still on the table.
I really like this approach. Showing that we must be doing it wrong because our brains are more efficient and we aren't doing it like our brains.
Is this a common thing in ML papers or something you came up with?
Have you heard of https://en.wikipedia.org/wiki/Bio-inspired_computing ?
Inefficiency in data input is also an interesting concept. It seems to me humans get more data in than even modern frontier models; if you use the gigabit/s estimates for sensory input. Care to elaborate on your thoughts?
What I mean is this: A brain today is obviously far more efficient at intelligence than our current approaches to AI. But a brain is a highly specialized chemical computer that evolved over hundreds of millions of years. That leaves a lot of room for inefficient and implausible strategies to play out! As long as wins are preserved, efficiency can improve this way anyway.
So the question is really, can we short cut that somehow?
It does seem like doing so would require a different approach. But so far all our other approaches to creating intelligence have been beaten by the big simple inefficient one. So it’s hard to see a path from here that doesn’t go that route.
We know there is a more efficient solution (human brain) but we don’t know how to make it.
So it stands to reason that we can make more efficient LLMs, just like a CPU can add numbers more efficiently than humans.
For example, analog computers can differentiate near instantly by leveraging the nature of electromagnetism and you can do very basic analogs of complex equations by just connecting containers of water together in certain (very specific) configurations. Are we sure that these optimizations to get us to AGI are possible without abusing the physical nature of the world? This is without even touching the hot mess that is quantum mechanics and its role in chemistry which in turn affects biology. I wouldn't put it past evolution to have stumbled upon some quantum mechanic that allowed for the emergence of general intelligence.
I'm super interested in anything discussing this but have very limited exposure to the literature in this space.
[0] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
In ANNs we backprop uniformly, so the error correction is distributed over the whole network. This is why LLM training is inefficient.
I believe human and machine learning unify into a pretty straightforward model and this shows that what we're doing that ML doesn't can be copied across, and I don't think the substrate is that significant.
Wheels other than rolling would likely never evolve naturally because there's no real incremental path from legs to wheels, where as flippers can evolve from webbed fingers incrementally getting better for moving in water.
I dunno, maybe there's an evolutionary path for wheels, but i don't think so.
Which we should expect, even from prior experience with any other AI breakthrough, where first we learn to do it and then we learn to do it efficiently.
E.g. Deep Blue in 1997 was IBM showing off a supercomputer, more than it was any kind of reasonably efficient algorithm, but those came over the next 20-30 years.