1. more data gets walled-off as owners realise value
2. stackoverflow-type feedback loops cease to exist as few people ask a public question and get public answers ... they ask a model privately and get an answer based on last visible public solutions
3. bad actors start deliberately trying to poison inputs (if sites served malicious responses to GPTBot/CCBot crawlers only, would we even know right now?)
4. more and more content becomes synthetically generated to the point pre-2023 physical books become the last-known-good knowledge
5. goverments and IP lawyers finally catch up
What's amazing to me to is that no one is throwing accusations of plagiarism.
I still think that if the "wrong people" had tried doing this they would have been obliterated by the courts.