If AI companies in the US are penalized for this, then the effect on copyright holders will only be slowed until foriegn AI companies overtake them. In such cases the legal recourse will be much slower and significantly limited.
Access to copyrighted materials might make for slightly better-trained models the way that access to more powerful GPUs does. But I don't think it will accelerate foundational advances in the underlying technology. If anything, maybe having to compete under tight constraints means AI companies will have to innovate more, rather than merely push scale.
The problem is that regardless of any innovations, scale still matters. If you figure out the technique to, say, make a model that is significantly better given N parameters - where N is just large enough to be the perfect fit for the amount of training data that you have access to - then someone else with access to more data will use the same technique to make a model with >N parameters, and it will be better than yours.
Gee, perhaps we should not have done this in the first place. 'Foreigners might copy the irresponsible thing we did so we have do more of it' is not the most brilliant argument.
But AI is mostly scale and only a little bit innovation. It’s undergraduate maths and a whole lot of computing power and data. Not being able to train on data on the internet would be a significant handicap.