Ok I answered my own question.
Ok I answered my own question.
In other words, the groups of folks working on training models don’t necessarily have access to the sort of optimization engineers that are working in other areas.
When all of this leaked into the open, it caused a lot of people knowledgeable in different areas to put their own expertise to the task. Some of those efforts (mmap) pay off spectacularly. Expect industry to copy the best of these improvements.
Of course it would save them some money if they could run their models on cheaper hardware, but they've raised $11B so I don't think that's much of a concern right now. Better to spend the efforts on pushing the model forward, which some of these optimisations may make harder.
That'd be a 10,000 fold depreciation of an asset due to a preventable oversight. Ouchies.