If you can listen to billions of tokens a day, you can basically capture all the magic.
DeepSeek is the most notable case, but it's been used lots.
And the foundation model companies are scraping and exfiltrating each others' data.