Most active commenters

Popular/hot comments

(www.nytimes.com)

Also https://www.washingtonpost.com/technology/2025/09/05/anthrop..., https://www.reuters.com/sustainability/boards-policy-regulat...

Show context

aeon_ai ◴[05 Sep 25 20:46 UTC] No.45143392[source]▶

>>45142885 (OP) #

To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #

rchaud ◴[05 Sep 25 23:15 UTC] No.45144837[source]▶

>>45143392 #

> Buying used copies of books, scanning them, and training on it is fine.

But nobody was ever going to that, not when there are billions in VC dollars at stake for whoever moves fastest. Everybody will simply risk the fine, which tends to not be anywhere close to enough to have a deterrent effect in the future.

That is like saying Uber would have not had any problems if they just entered into a licensing contract with taxi medallion holders. It was faster to just put unlicensed taxis on the streets and use investor money to pay fines and lobby for favorable legislation. In the same way, it was faster for Anthropic to load up their models with un-DRM'd PDFs and ePUBs from wherever instead of licensing them publisher by publisher.

replies(15): >>45144965 #>>45145196 #>>45145216 #>>45145270 #>>45145297 #>>45145300 #>>45145388 #>>45146392 #>>45146407 #>>45146846 #>>45147108 #>>45147461 #>>45148242 #>>45152291 #>>45152841 #

1. jayd16 ◴[06 Sep 25 00:23 UTC] No.45145297[source]▶

>>45144837 #

> But nobody was ever going to that

Didn't Google have a long standing project to do just that?

https://en.wikipedia.org/wiki/Google_Books

replies(3): >>45146230 #>>45147075 #>>45147411 #

2. efskap ◴[06 Sep 25 02:55 UTC] No.45146230[source]▶

>>45145297 (TP) #

Crazy to think we've been helping train AI through captchas long before the "click all squares containing" ones.

replies(1): >>45146752 #

3. a2128 ◴[06 Sep 25 05:03 UTC] No.45146752[source]▶

>>45146230 #

"stop spam. read books." is a very ironic phrase to look back on considering the amount of spam on the internet that LLMs have enabled

4. miohtama ◴[06 Sep 25 06:23 UTC] No.45147075[source]▶

>>45145297 (TP) #

This lawsuit also makes sure that only parties that can train an AI with good enough training material are now

- Google

- Anthropic

- Any Chinese company who do not care about copyright laws

What is the cost of buying and scanning books?

replies(2): >>45147378 #>>45147421 #

5. slow_typist ◴[06 Sep 25 07:40 UTC] No.45147378[source]▶

>>45147075 #

Training a Model on 100+ years old literature only could be an interesting experience though.

replies(2): >>45147574 #>>45147690 #

6. godelski ◴[06 Sep 25 07:49 UTC] No.45147411[source]▶

>>45145297 (TP) #

From TFA

  The Google Books project also faced a copyright lawsuit, which was eventually decided in favor of Google.

  After contacting major publishers about possibly licensing their books, [former head of the Google Books project] bought physical books in bulk from distributors and retailers, according to court documents. He then hired outside organizations to dissemble the books, scan them and create digital copies that could be used to train the company’s AI. technologies.

  Judge Alsup ruled that this approach was fair use under the law. But he also found the company’s previous approach — downloading and storing books from shadow libraries like Library Genesis and Pirate Library Mirror — was illegal.

replies(3): >>45151241 #>>45151280 #>>45155759 #

7. godelski ◴[06 Sep 25 07:50 UTC] No.45147421[source]▶

>>45147075 #

From TFA

  > Anthropic also agreed to delete the pirated works it downloaded and stored.

Also

  > As part of the settlement, Anthropic said that it did not use any pirated works to build A.I. technologies that were publicly released.

replies(3): >>45147620 #>>45148168 #>>45153027 #

8. rollcat ◴[06 Sep 25 08:26 UTC] No.45147574{3}[source]▶

>>45147378 #

’Twould wax yet more marvellous to ye beholders.

9. Iolaum ◴[06 Sep 25 08:36 UTC] No.45147620{3}[source]▶

>>45147421 #

Reminds me when Facebook said to EU that they did not have the technology to merge FB and Whatsapp accounts when they bought Whatapp.

10. IshKebab ◴[06 Sep 25 08:53 UTC] No.45147690{3}[source]▶

>>45147378 #

It's been done.

https://github.com/haykgrigo3/TimeCapsuleLLM

11. kelnos ◴[06 Sep 25 10:44 UTC] No.45148168{3}[source]▶

>>45147421 #

That's not really the point, though, is it? Now Anthropic can afford to buy books and get them scanned. They likely didn't have the money or time to do that before.

And even if they didn't use the illegally-obtained work to train any of the models they released, of course they used them to train unreleased prototypes and to make progress at improving their models and training methods.

By engaging in illegal activity, they advanced their business faster and more cheaply than they otherwise would have been able to. With this settlement, other new AI companies will see it on the record that they could face penalties if they do this, and will have to go the slower, more expensive route -- if they can even afford to do so.

It might not make it impossible, but it makes the moat around the current incumbents just that much wider.

12. imwm ◴[06 Sep 25 17:31 UTC] No.45151241[source]▶

>>45147411 #

Disassemble*

13. rchaud ◴[06 Sep 25 17:37 UTC] No.45151280[source]▶

>>45147411 #

That wasn't done as a play for venture capital. The Google Books project began before eBooks existed; in the 2000s, they spent money on all kinds of projects that had no real strategy for monetization. I remember Google Books being a valuable resource as it digitized books that were out of print. Back when they actually cared about making information available widely.

14. DrillShopper ◴[06 Sep 25 21:28 UTC] No.45153027{3}[source]▶

>>45147421 #

> As part of the settlement, Anthropic said that it did not use any pirated works to build A.I. technologies that were publicly released.

Oh so now we're at "just trust me bro" levels of absurdity

15. Thorrez ◴[07 Sep 25 05:55 UTC] No.45155759[source]▶

>>45147411 #

Yeah. Weird that rchaud said "But nobody was ever going to that" when the article talks about someone doing it.

↑

Anthropic agrees to pay $1.5B to settle lawsuit with book authors