Whatever model is at the top can be surpassed if a competitor has enough compute scale. We are rapidly approaching the era where it’s difficult to have enough power in one campus. Distributed sites are needed if models continue to scale at 4.7x/year (see Epoch.ai) simply from a power perspective. You have to put the data centers where the power is and connect them together.
I believe the era of distributed training is already here however not everyone will be able to distribute training to multiple sites using their scale up networks. Their scale out networks will not be ready. So it could be that we see models plateau until distributed training infra is available.
I see the infrastructure side of AI and based on HW build out; Google has been slow to build and is behind everywhere.