Ask HN: Any insider takes on Yann LeCun's push against current architectures?

I feel like success of LLM's have been combination of multiple factors coming together favourably: 1) Hardware becoming cheap enough to train models beyond a size where we could see emergent properties. Which is going to become cheaper and cheaper. 2) Model architecture which can in computationally less expensive manner being able to look at all inputs at the same time. CNN's, RNN's all succeded at smaller scale becuase they added inductive bias in architecture favourable to the input modality, but also became less generic. Attention is simpler in computation to scale it and also has lower inductive bias. 3) Unsupervised text on internet being source of data which requires light pre-processing hence almost no efforts wrt annotations etc reaching scale wrt scaling laws corrosponding to large size models. Also text data being diverse enough to be generic to encompass variety of topics, thoughts vs imagenet etc which is highly specific and costly to produce.

Assuming that text only models will hit a bottleneck, then to have next generation models, in addition to a new architecture, we also have to find rich dataset which is even more generic and much richer in modalities and the architecture being able to natively ingest it?

However something that is not predictible is how well the emergent properties can scale with model size further. Maybe few more unlocks like model being able to retain information well in spite of really large context length, ability to SFT on super complex reasoning tasks without disrupting weights enough to loose unsupervised learning might take us much further?