(www.nytimes.com)

989 points acomjean | 2 comments | 05 Sep 25 19:52 UTC | HN request time: 0.411s | source

Also https://www.washingtonpost.com/technology/2025/09/05/anthrop..., https://www.reuters.com/sustainability/boards-policy-regulat...

Show context

petralithic ◴[05 Sep 25 20:54 UTC] No.45143482[source]▶

>>45142885 (OP) #

This is sad for open source AI, piracy for the purpose of model training should also be fair use because otherwise only the big companies who can afford to pay off publishers like Anthropic will be able to do so. There is no way to buy billions of books just for model training, it simply can't happen.

replies(9): >>45143523 #>>45143780 #>>45143876 #>>45144861 #>>45145004 #>>45145076 #>>45146993 #>>45147328 #>>45148584 #

sefrost ◴[05 Sep 25 21:22 UTC] No.45143780[source]▶

>>45143482 #

I wonder how much it would cost to buy every book that you'd want to train a model.

replies(1): >>45144140 #

GMoromisato ◴[05 Sep 25 21:58 UTC] No.45144140[source]▶

>>45143780 #

500,000 x $20 = $10 million

Obviously there would be handling costs + scanning costs, so that’s the floor.

Maybe $20 million total? Plus, of course, the time it would take to execute.

replies(1): >>45152288 #

1. riskable ◴[06 Sep 25 19:43 UTC] No.45152288[source]▶

>>45144140 #

The real expense is in the data centers/hardware.

The cost of the books is negligible in comparison.

replies(1): >>45152990 #

2. Scoundreller ◴[06 Sep 25 21:25 UTC] No.45152990[source]▶

>>45152288 (TP) #

Somewhere a gritty warehouse in a developing country is receiving shipping containers of old books, massive teams manually flipping each page as a 2nd hand Canon digicam takes a pic of each page, to be OCR’d by the same AI being trained.

Once the book is done, 99% of them go into the furnace at the district heating boiler next door. The other 1% back to a developed country for resale.

↑

Anthropic agrees to pay $1.5B to settle lawsuit with book authors