←back to thread

989 points acomjean | 1 comments | | HN request time: 0s | source
Show context
aeon_ai ◴[] No.45143392[source]
To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #
rchaud ◴[] No.45144837[source]
> Buying used copies of books, scanning them, and training on it is fine.

But nobody was ever going to that, not when there are billions in VC dollars at stake for whoever moves fastest. Everybody will simply risk the fine, which tends to not be anywhere close to enough to have a deterrent effect in the future.

That is like saying Uber would have not had any problems if they just entered into a licensing contract with taxi medallion holders. It was faster to just put unlicensed taxis on the streets and use investor money to pay fines and lobby for favorable legislation. In the same way, it was faster for Anthropic to load up their models with un-DRM'd PDFs and ePUBs from wherever instead of licensing them publisher by publisher.

replies(15): >>45144965 #>>45145196 #>>45145216 #>>45145270 #>>45145297 #>>45145300 #>>45145388 #>>45146392 #>>45146407 #>>45146846 #>>45147108 #>>45147461 #>>45148242 #>>45152291 #>>45152841 #
jayd16 ◴[] No.45145297[source]
> But nobody was ever going to that

Didn't Google have a long standing project to do just that?

https://en.wikipedia.org/wiki/Google_Books

replies(3): >>45146230 #>>45147075 #>>45147411 #
miohtama ◴[] No.45147075[source]
This lawsuit also makes sure that only parties that can train an AI with good enough training material are now

- Google

- Anthropic

- Any Chinese company who do not care about copyright laws

What is the cost of buying and scanning books?

Copyright law needs to be fixed and its ridiculous hundred years tenure chopped away.

replies(2): >>45147378 #>>45147421 #
godelski ◴[] No.45147421[source]
From TFA

  > Anthropic also agreed to delete the pirated works it downloaded and stored.
Also

  > As part of the settlement, Anthropic said that it did not use any pirated works to build A.I. technologies that were publicly released.
replies(3): >>45147620 #>>45148168 #>>45153027 #
1. Iolaum ◴[] No.45147620{3}[source]
Reminds me when Facebook said to EU that they did not have the technology to merge FB and Whatsapp accounts when they bought Whatapp.