Anthropic agrees to pay $1.5B to settle lawsuit with book authors

(www.nytimes.com)

989 points acomjean | 2 comments | 05 Sep 25 19:52 UTC | HN request time: 0s | source

Also https://www.washingtonpost.com/technology/2025/09/05/anthrop..., https://www.reuters.com/sustainability/boards-policy-regulat...

Show context

non_aligned ◴[05 Sep 25 21:01 UTC] No.45143568[source]▶

>>45142885 (OP) #

I'm gonna say one thing. If you agree that something was unfairly taken from book authors, then the same thing was taken from people publishing on the web, and on a larger scale.

Book authors may see some settlement checks down the line. So might newspapers and other parties that can organize and throw enough $$$ at the problem. But I'll eat my hat if your average blogger ever sees a single cent.

replies(3): >>45143814 #>>45143940 #>>45144227 #

ascorbic ◴[05 Sep 25 21:38 UTC] No.45143940[source]▶

>>45143568 #

The settlement was for downloading the pirated books, not training from them. Unless they're paywalled it would be hard to argue the same for a blog.

replies(1): >>45144747 #

rise_before_sun ◴[05 Sep 25 23:05 UTC] No.45144747[source]▶

>>45143940 #

It seems weird that there was legal culpability for downloading pirated books but not for training on them. At the very least, there is a transitive dependency between the two acts.

Other people have said that Anthropic bought the books later on, but I haven't found any official records for that. Where would I find that?

Also, does anyone know which Anthropic models were NOT trained on the pirated books. I want to avoid such models.

replies(1): >>45145817 #

1. emtel ◴[06 Sep 25 01:42 UTC] No.45145817[source]▶

>>45144747 #

As far as anyone knows, no models were trained on the illegally downloaded books.

replies(1): >>45145941 #

2. rise_before_sun ◴[06 Sep 25 02:05 UTC] No.45145941[source]▶

>>45145817 (TP) #

The following document indicates otherwise.

https://storage.courtlistener.com/recap/gov.uscourts.cand.43....

"Similarly, different sets or “subsets” or “parts of” or “portions” of the collections sourced from Books3, LibGen, and PiLiMi were used to train different LLMs..." Page 5

"In sum, the copies of books pirated or purchased-and-destructively-scanned were placed into a central “research library” or “generalized data area,” sets or subsets were copied again to create training copies for data mixes, the training copies were successively copied to be cleaned, tokenized, and compressed into any given trained LLM, and once trained an LLM did not output through Claude to the public any further copies." Page 7

The phrase "Finally, once Anthropic decided a copy of a pirated or scanned book in the library would not be used for training at all or ever again, Anthropic still retained that work as a “hard resource” for other uses or future uses" implies to me Anthropic excluded certain books from training, not that they excluded all the pirated books from training.

↑