I agree with you that Moore's Law being dead means we can't expect much more from current, silicon-based GPU compute. Any improvement from hardware alone is going to have to come from completely new compute technology, of which I don't think there is anything mature enough to expect any results in the next 10 years.
Right now, hardware wise, we need more RAM in GPUs than we really need compute. But it's a breakpoint issue: you need enough RAM to hold the model. More RAM that is less than the model is not going to improve things much. More RAM that is more than the model is largely dead weight.
I don't think larger models are going to show any major inference improvements. They hit the long tail of diminishing returns re: model training vs quality of output at least 2 years ago.
I think the best anyone can hope for in optimizing current LLM technology is improve the performance of inference engines, and there at most I can imagine only about a 5x improvement. That would be a really long tail of performance optimizations that would take at least a decade to achieve. In the 1 to 2 year timeline, I think the best that could be hoped for is a 2x improvement. But I think we may have already seen much of the low hanging optimization fruit already picked, and are starting to turn the curve into that long tail of incremental improvements.
I think everyone betting on LLMs improving the performance of junior to mid level devs and that leading to a Renaissance of software development speed is wildly over optimistic as to the total contribution to productivity those developers already represent. Most of the most important features are banged out by harried, highly skilled senior developers. Most everyone else is cleaning up around the edges of that. Even a 2 or 3x improvement of the bottom 10% of contributions is only going to grow the pie just so much. And I think these tools are basically useless to skilled senior devs. All this "boilerplate" code folks keep cheering the AI is writing for them is just not that big of a deal. 15 minutes of savings once a month.
But I see how this technology works and what people are asking it to do (which in my company is basically "all the hard work that you already weren't doing, so how are you going to even instruct an LLM to do it if you don't really know how to do it?") and there is such a huge gap between the two that I think it's going to take at least a 100x improvement to get there.
I can't see AI being all that much of an improvement on productivity. It still gives wrong results too many times. The work needed to make it give good results is the same sort of work we should have been doing already to be able to leverage classical ML systems with more predictable performance and output. We're going to spend trillions as an industry trying to chase AI that will only end up being an exercise in making sure documents are stored in a coherent, searchable way. At which point, why not do just that and avoid having to pressure the energy industry to firing up a bunch of old coal plants to meet demand?