Book authors may see some settlement checks down the line. So might newspapers and other parties that can organize and throw enough $$$ at the problem. But I'll eat my hat if your average blogger ever sees a single cent.
Book authors may see some settlement checks down the line. So might newspapers and other parties that can organize and throw enough $$$ at the problem. But I'll eat my hat if your average blogger ever sees a single cent.
Other people have said that Anthropic bought the books later on, but I haven't found any official records for that. Where would I find that?
Also, does anyone know which Anthropic models were NOT trained on the pirated books. I want to avoid such models.
https://storage.courtlistener.com/recap/gov.uscourts.cand.43....
"Similarly, different sets or “subsets” or “parts of” or “portions” of the collections sourced from Books3, LibGen, and PiLiMi were used to train different LLMs..." Page 5
"In sum, the copies of books pirated or purchased-and-destructively-scanned were placed into a central “research library” or “generalized data area,” sets or subsets were copied again to create training copies for data mixes, the training copies were successively copied to be cleaned, tokenized, and compressed into any given trained LLM, and once trained an LLM did not output through Claude to the public any further copies." Page 7
The phrase "Finally, once Anthropic decided a copy of a pirated or scanned book in the library would not be used for training at all or ever again, Anthropic still retained that work as a “hard resource” for other uses or future uses" implies to me Anthropic excluded certain books from training, not that they excluded all the pirated books from training.