>
Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.
> Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.
That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case. I'm not begrudging them for raking in heaps of money offering generative AI services, because they're legitimately offering value that's at least commensurate (IMHO it's much greater) to what they charge, and that value comes entirely from the work they're uniquely able to do, and any individual work that went into training data contributes approximately zero to it.
(GenAI doesn't rely on any individual work in training data; it relies on the breadth and amount being a notable fraction of humanity's total intellectual output. It so happens that almost all knowledge and culture is subject to copyright, so you couldn't really get to this without stepping on some legal landmines.)
(Also, much like AI companies would like the law to favor them, their opponents in this case would like the law to dictate they should be compensated for their works being used in training data, but compensated way beyond any value their works bring in, which in reality is, again, approximately zero.)