←back to thread

989 points acomjean | 2 comments | | HN request time: 0s | source
Show context
aeon_ai ◴[] No.45143392[source]
To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #
jimmydoe ◴[] No.45143567[source]
Google scanned many books quite a while ago, probably way more than LibGen. Are they good to use them for training?
replies(4): >>45143582 #>>45143630 #>>45143640 #>>45143828 #
1. johanyc ◴[] No.45143630[source]
If they legally purchased them I dont think why not. IIRC they did borrow from libraries so probably not every book in Google Books
replies(1): >>45145402 #
2. greensoap ◴[] No.45145402[source]
Anthropic legally purchased the books it used to train its model according to the judge. And the judge said that was fine. Anthropic also downloaded books from a pirate site and the judge said that was bad -- even though the judge also said they didn't use those books for training....