←back to thread

989 points acomjean | 1 comments | | HN request time: 0.203s | source
Show context
aeon_ai ◴[] No.45143392[source]
To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #
jimmydoe ◴[] No.45143567[source]
Google scanned many books quite a while ago, probably way more than LibGen. Are they good to use them for training?
replies(4): >>45143582 #>>45143630 #>>45143640 #>>45143828 #
ortusdux ◴[] No.45143828[source]
They litigated this a while ago and my understanding was that they were able to claim fair use, but I'm no expert.

What I'm wondering is if they, or others, have trained models on pirated content that has flowed through their networks?

replies(1): >>45144446 #
1. jazzyjackson ◴[] No.45144446[source]
Books.Google.Com was deemed fair use because it only shows previews, not full downloads. Internet Archive is still under litigation iirc besides having owned a physical copy of every book they ever scanned (and keeping a copy in their warehouses) they let people read the whole thing.

I’m surprised Google hasn’t hit its competitors harder with the fact that they actually got permission to scan books from its partner libraries and Facebook and OpenAI just torrented books2/books3, but I guess they have aligned incentive to benefit from a legal framework that doesn’t look to closely at how you went about collecting source material