←back to thread

989 points acomjean | 1 comments | | HN request time: 0s | source
Show context
aeon_ai ◴[] No.45143392[source]
To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #
mdp2021 ◴[] No.45144037[source]
> Buying used copies of books

It remains deranged.

Everyone has more than a right to freely have read everything is stored in a library.

(Edit: in fact initially I wrote 'is supposed to' in place of 'has more than a right to' - meaning that "knowledge is there, we made it available: you are supposed to access it, with the fullest encouragement").

replies(3): >>45144141 #>>45145658 #>>45145964 #
vkou ◴[] No.45145658[source]
> Everyone has more than a right to freely have read everything is stored in a library.

Every human has the right to read those books.

And now, this is obvious, but it seems to be frequently missed - an LLM is not a human, and does not have such rights.

replies(2): >>45145778 #>>45147703 #
nl ◴[] No.45145778[source]
By US law, cccording to Author's Guild vs Google[1] on the Google book scanning project, scanning books for indexes is fair use.

Additionally:

> Every human has the right to read those books.

Since when?

I strongly disagree - knowledge should be free.

I don't think the author's arrangement of the words should be free to reproduce (ie, I think some degree of copyright protection is ethical) but if I want to use a tool to help me understand the knowledge in a book then I should be able to.

[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

replies(6): >>45145933 #>>45146371 #>>45147476 #>>45150582 #>>45153091 #>>45153137 #
TheDong ◴[] No.45146371[source]
Knowledge should be free. Unfortunately, OpenAI and most other AI companies are for-profit, and so they vacuum up the commons, and produce tooling which is for-profit.

If you use the commons to create your model, perhaps you should be obligated to distribute the model for free (or I guess for the cost of distribution) too.

replies(3): >>45146783 #>>45147015 #>>45148056 #
nl ◴[] No.45147015[source]
I don't pay OpenAI and I use their model via ChatGPT frequently.

By this logic one shouldn't be able to research for a newspaper article at a library.

replies(2): >>45147296 #>>45153067 #
1. martin-t ◴[] No.45153067{6}[source]
And no doubt you understand that this is the current state, not a stable equilibrium.

They'll either go out of business or make better models paid while providing only weaker models for free despite both being trained on the same data.