←back to thread

989 points acomjean | 1 comments | | HN request time: 0.293s | source
Show context
aeon_ai ◴[] No.45143392[source]
To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #
shortformblog ◴[] No.45144244[source]
Thanks for the reminder that what the Internet Archive did in its case would have been legal if it was in service of an LLM.
replies(5): >>45144265 #>>45144520 #>>45144761 #>>45146088 #>>45152104 #
1. tpmoney ◴[] No.45152104[source]
I like the IA as much as anyone else, but surely there's a significant difference between distributing literal word for word exact copies of copyrighted material and distributing statistical indexes about copyrighted material right?