←back to thread

989 points acomjean | 2 comments | | HN request time: 3.353s | source
Show context
aeon_ai ◴[] No.45143392[source]
To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #
florbnit ◴[] No.45144321[source]
> Buying used copies of books, scanning them, and training on it is fine.

Buying used copies of books, scanning them, and printing them and selling them: not fair use

Buying used copies of books, scanning them, and making merchandise and selling it: not fair use

The idea that training models is considered fair use just because you bought the work is naive. Fair use is not a law to leave open usage as long as it doesn’t fit a given description. It’s a law that specifically allows certain usages like criticism, comment, news reporting, teaching, scholarship, or research. Training AI models for purposes other than purely academic fits into none of these.

replies(4): >>45144357 #>>45144365 #>>45144395 #>>45154395 #
bigmadshoe ◴[] No.45144357[source]
Buying used copies of books, scanning them, training an employee with the scans: fair use.

Unless legislation changes, model training is pretty much analogous to that. Now of course if the employee in question - or the LLM - regurgitates a copyrighted piece verbatim, that is a violation and would be treated accordingly in either case.

replies(2): >>45144425 #>>45151017 #
1. bink ◴[] No.45144425[source]
> Buying used copies of books, scanning them, training an employee with the scans: fair use.

Does this still hold true if multiple employees are "trained" from scanned copies at the same time?

replies(1): >>45144515 #
2. bigmadshoe ◴[] No.45144515[source]
Simultaneously I guess that would violate copyright, which is an interesting point. Maybe there's a case to be made there with model training.

Regardless, the issue could be resolved by buying as many copies as you have concurrent model training instances. It isn't really an issue with training on copyrighted work, just a matter of how you do so.