(www.nytimes.com)

989 points acomjean | 1 comments | 05 Sep 25 19:52 UTC | HN request time: 0.881s | source

Also https://www.washingtonpost.com/technology/2025/09/05/anthrop..., https://www.reuters.com/sustainability/boards-policy-regulat...

Show context

aeon_ai ◴[05 Sep 25 20:46 UTC] No.45143392[source]▶

>>45142885 (OP) #

To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #

therobots927 ◴[05 Sep 25 20:53 UTC] No.45143460[source]▶

>>45143392 #

It is related to scalable mode training, however. Chopping the spine off books and putting the pages in an automated scanner is not scalable. And don't forget about the cost of 1) finding 2) purchasing 3) processing and 4) recycling that volume of books.

replies(2): >>45143471 #>>45143502 #

1. debugnik ◴[05 Sep 25 20:56 UTC] No.45143502[source]▶

>>45143460 #

I guess companies will pay for the cheapest copies for liability and then use the pirated dumps. Or just pretend that someone lent the books to them.

↑

Anthropic agrees to pay $1.5B to settle lawsuit with book authors