A point of clarifications and some questions.
The portion the court said was bad was not Anthropic getting books from pirated sites to train its model. The court opined that training the model was fair use and did not distinguish between getting the books from pirated sites or hard copy scans. The part the court said was bad, which was settled, was Anthropic getting books from a pirate site to store in a general purpose library.
--
"To summarize the analysis that now follows, the use of the books at issue to train Claude
and its precursors was exceedingly transformative and was a fair use under Section 107 of the
Copyright Act. And, the digitization of the books purchased in print form by Anthropic was.
also a fair use but not for the same reason as applies to the training copies. Instead, it was a
fair use because all Anthropic did was replace the print copies it had purchased for its central
library with more convenient space-saving and searchable digital copies for its central
library — without adding new copies, creating new works, or redistributing existing copies.
However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy."
"Because the legal issues differ between the *library copies* Anthropic purchased and
pirated, this order takes them in turn."
--
Questions
As an author do you think it matters where the book was copied from? Presumably, a copyright gives the author the right to control when a text is reproduced and distributed. If the AI company buys a book and scans it, they are reproducing the book without a license, correct? And fair use is the argument that even though they violated the copyright, they are execused. In a pure sense, if the AI company copied (assuming they didn't torrent back the book) from a "pirate source" why is that copy worse then if they copied from a hard book?