Andrej Karpathy: Software in the era of AI [video]

The slide at 13m claims that LLMs flip the script on technology diffusion and give power to the people. Nothing could be further from the truth.

Large corporations, which have become governments in all but name, are the only ones with the capability to create ML models of any real value. They're the only ones with access to vast amounts of information and resources to train the models. They introduce biases into the models, whether deliberately or not, that reinforces their own agenda. This means that the models will either avoid or promote certain topics. It doesn't take a genius to imagine what will happen when the advertising industry inevitably extends its reach into AI companies, if it hasn't already.

Even open weights models which technically users can self-host are opaque blobs of data that only large companies can create, and have the same biases. Even most truly open source models are useless since no individual has access to the same large datasets that corporations use for training.

So, no, LLMs are the same as any other technology, and actually make governments and corporations even more powerful than anything that came before. The users benefit tangentially, if at all, but will mostly be exploited as usual. Though it's unsurprising that someone deeply embedded in the AI industry would claim otherwise.

Well there are cases like OLMo where the process, dataset, and model are all open source. As expected though, it doesn't really compare well to the worst closed model since the dataset can't contain vast amounts of stolen copyrighted data that noticeably improves the model. Llama is not good because Meta knows what they're doing, it's good because it was pretrained on the entirety of Anna's Archive and every pirated ebook they could get their hands on. Same goes for Elevenlabs and pirated audiobooks.

Lack of compute on the Ai2's side also means the context OLMo is trained for is miniscule, the other thing that you need to throw brazillions of dollars at to make model that's maybe useful in the end if you're very lucky. Training needs high GPU interconnect bandwidth, it can't be done in distributed horde in any meaningful way even if people wanted to.

The only ones who have the power now are the Chinese, since they can easily ignore copyright for datasets, patents for compute, and have infinite state funding.