Anthropic agrees to pay $1.5B to settle lawsuit with book authors

(www.nytimes.com)

Also https://www.washingtonpost.com/technology/2025/09/05/anthrop..., https://www.reuters.com/sustainability/boards-policy-regulat...

Show context

aeon_ai ◴[05 Sep 25 20:46 UTC] No.45143392[source]▶

>>45142885 (OP) #

To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #

rchaud ◴[05 Sep 25 23:15 UTC] No.45144837[source]▶

>>45143392 #

> Buying used copies of books, scanning them, and training on it is fine.

But nobody was ever going to that, not when there are billions in VC dollars at stake for whoever moves fastest. Everybody will simply risk the fine, which tends to not be anywhere close to enough to have a deterrent effect in the future.

That is like saying Uber would have not had any problems if they just entered into a licensing contract with taxi medallion holders. It was faster to just put unlicensed taxis on the streets and use investor money to pay fines and lobby for favorable legislation. In the same way, it was faster for Anthropic to load up their models with un-DRM'd PDFs and ePUBs from wherever instead of licensing them publisher by publisher.

replies(15): >>45144965 #>>45145196 #>>45145216 #>>45145270 #>>45145297 #>>45145300 #>>45145388 #>>45146392 #>>45146407 #>>45146846 #>>45147108 #>>45147461 #>>45148242 #>>45152291 #>>45152841 #

greensoap ◴[06 Sep 25 00:34 UTC] No.45145388[source]▶

>>45144837 #

Anthropic literally did exactly this to train its models according to the lawsuit. The lawsuit found that Anthropic didn't even use the pirated books to train its model. So there is that

replies(2): >>45145606 #>>45146191 #

hcs ◴[06 Sep 25 01:06 UTC] No.45145606[source]▶

>>45145388 #

The lawsuit didn't find anything, Anthropic claimed this as part of the settlement. Companies settle without admission of wrongdoing all the time, to the extent that it can be bargained for.

replies(2): >>45145655 #>>45169216 #

1. ijk ◴[06 Sep 25 01:15 UTC] No.45145655[source]▶

>>45145606 #

The judge's ruling from earlier certainly seemed to me to suggest that the training was fair use.

Obviously, that's not part of the current settlement. I'm no expert on this, so I don't know the extent to which the earlier ruling applies.

replies(1): >>45145854 #

2. hcs ◴[06 Sep 25 01:48 UTC] No.45145854[source]▶

>>45145655 (TP) #

If I'm reading this right yes the training was fair use, but I was responding (unclearly) to the claim that the pirated books weren't used to train commercially released LLMs. The judge complained that it wasn't clear what was actually used, from the June order https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/... [pdf]:

> Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and millions of other books was justified because all those copies were at least reasonably necessary for training LLMs — and yet Anthropic has resisted putting into the record what copies or even sets of copies were in fact used for training LLMs.

> We know that Anthropic has more information about what it in fact copied for training LLMs (or not). Anthropic earlier produced a spreadsheet that showed the composition of various data mixes used for training various LLMs — yet it clawed back that spreadsheet in April. A discovery dispute regarding that spreadsheet remains pending.

replies(1): >>45145976 #

3. rise_before_sun ◴[06 Sep 25 02:11 UTC] No.45145976[source]▶

>>45145854 #

Thanks for this info. I was looking for which pirated books were used for which model.

Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?

To me, the taint still remains. Which is a shame, because it's considered the best coding model so far.

replies(1): >>45146768 #

4. heavyset_go ◴[06 Sep 25 05:08 UTC] No.45146768{3}[source]▶

>>45145976 #

> Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?

No, it part because it removes agency from the authors/rightsholders. Maybe they don't want to sell Anthropic their books, maybe they want royalties, etc.

replies(1): >>45146989 #

5. jack_pp ◴[06 Sep 25 06:03 UTC] No.45146989{4}[source]▶

>>45146768 #

Can authors even claim such rights though? I doubt think they even had such agency to begin with

replies(1): >>45152127 #

6. heavyset_go ◴[06 Sep 25 19:22 UTC] No.45152127{5}[source]▶

>>45146989 #

If they're the rightsholders, they can do whatever they want with their IP, including changing licensing terms, adding contractual obligations forbidding certain types of use, forbidding sale, etc.

replies(1): >>45152921 #

7. flir ◴[06 Sep 25 21:17 UTC] No.45152921{6}[source]▶

>>45152127 #

I feel like that would bounce hard off first sale doctrine. But what do I know.

replies(1): >>45153556 #

8. heavyset_go ◴[06 Sep 25 22:38 UTC] No.45153556{7}[source]▶

>>45152921 #

You still have to adhere to license and copyright terms after first sale.

You can't sell a Bluray disk to a movie theater and give them the right to charge an audience to watch it in the theater later.

If rightsholders are worried about certain uses of their IP being found to be fair use, they might then change the terms of release contractually to stop or at least partially prevent training.

↑