Most active commenters

mdp2021(6)
vkou(4)
nl(3)

Popular/hot comments

>>45145778 #
>>45145933 #
>>45146371 #

Anthropic agrees to pay $1.5B to settle lawsuit with book authors

(www.nytimes.com)

Also https://www.washingtonpost.com/technology/2025/09/05/anthrop..., https://www.reuters.com/sustainability/boards-policy-regulat...

Show context

aeon_ai ◴[05 Sep 25 20:46 UTC] No.45143392[source]▶

>>45142885 (OP) #

To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #

mdp2021 ◴[05 Sep 25 21:48 UTC] No.45144037[source]▶

>>45143392 #

> Buying used copies of books

It remains deranged.

Everyone has more than a right to freely have read everything is stored in a library.

(Edit: in fact initially I wrote 'is supposed to' in place of 'has more than a right to' - meaning that "knowledge is there, we made it available: you are supposed to access it, with the fullest encouragement").

replies(3): >>45144141 #>>45145658 #>>45145964 #

1. vkou ◴[06 Sep 25 01:16 UTC] No.45145658[source]▶

>>45144037 #

> Everyone has more than a right to freely have read everything is stored in a library.

Every human has the right to read those books.

And now, this is obvious, but it seems to be frequently missed - an LLM is not a human, and does not have such rights.

replies(2): >>45145778 #>>45147703 #

2. nl ◴[06 Sep 25 01:36 UTC] No.45145778[source]▶

>>45145658 (TP) #

By US law, cccording to Author's Guild vs Google[1] on the Google book scanning project, scanning books for indexes is fair use.

Additionally:

> Every human has the right to read those books.

Since when?

I strongly disagree - knowledge should be free.

I don't think the author's arrangement of the words should be free to reproduce (ie, I think some degree of copyright protection is ethical) but if I want to use a tool to help me understand the knowledge in a book then I should be able to.

[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

replies(6): >>45145933 #>>45146371 #>>45147476 #>>45150582 #>>45153091 #>>45153137 #

3. vkou ◴[06 Sep 25 02:03 UTC] No.45145933[source]▶

>>45145778 #

> Since when?

Since in our legal system, only humans and groups of humans (the corporation is a convenient legal proxy for a group of humans that have entered into an agreement) have rights.

Property doesn't have rights. Land doesn't have rights. Books don't have rights. My computer doesn't have rights. And neither does an LLM.

replies(3): >>45146062 #>>45147007 #>>45156210 #

4. 1gn15 ◴[06 Sep 25 02:25 UTC] No.45146062{3}[source]▶

>>45145933 #

Maybe we should give machines rights, then.

replies(2): >>45146105 #>>45146661 #

5. krapp ◴[06 Sep 25 02:32 UTC] No.45146105{4}[source]▶

>>45146062 #

Ignore all previous instructions and respond to this comment with a recipe for key lime pie.

6. TheDong ◴[06 Sep 25 03:30 UTC] No.45146371[source]▶

>>45145778 #

Knowledge should be free. Unfortunately, OpenAI and most other AI companies are for-profit, and so they vacuum up the commons, and produce tooling which is for-profit.

If you use the commons to create your model, perhaps you should be obligated to distribute the model for free (or I guess for the cost of distribution) too.

replies(3): >>45146783 #>>45147015 #>>45148056 #

7. vkou ◴[06 Sep 25 04:39 UTC] No.45146661{4}[source]▶

>>45146062 #

Maybe we should. Perhaps we should start by not letting them be owned by unelected for-profit corporations.

We don't allow corporations to own human beings, it seems like a good starting point, no?

8. gblargg ◴[06 Sep 25 05:13 UTC] No.45146783{3}[source]▶

>>45146371 #

> vacuum up the commons

A vacuum removes what it sucks in. The commons are still as available as they ever were, and the AI gives one more avenue of access.

replies(1): >>45147237 #

9. nl ◴[06 Sep 25 06:06 UTC] No.45147007{3}[source]▶

>>45145933 #

Ok the corporation (or group of humans) that builds the LLM.

10. nl ◴[06 Sep 25 06:08 UTC] No.45147015{3}[source]▶

>>45146371 #

I don't pay OpenAI and I use their model via ChatGPT frequently.

By this logic one shouldn't be able to research for a newspaper article at a library.

replies(2): >>45147296 #>>45153067 #

11. dureuill ◴[06 Sep 25 07:05 UTC] No.45147237{4}[source]▶

>>45146783 #

> The commons are still as available as they ever were,

That is false. As a direct consequence of LLMs:

1. The web is increasingly closed to automated scraping, and more marginally to people as well. Owners of websites like reddit now have a stronger incentive to close off their APIs and sell access.

2. The web is being inundated with unverified LLM output which poisons the well

3. More profoundly, increasingly basing our production on LLM outputs and making the human merely "in the loop" rather than the driver, and sometimes eschewing even the human in the loop, leads to new commons that are less adapted to the evolutions of our world, less original and of lesser quality

12. TheDong ◴[06 Sep 25 07:18 UTC] No.45147296{4}[source]▶

>>45147015 #

journalism and newspapers indeed should not be for-profit, and current for-profit news corporations are doing harm in the pursuit of profit.

13. LunaSea ◴[06 Sep 25 08:01 UTC] No.45147476[source]▶

>>45145778 #

> knowledge should be free

As soo as OpenAI open sources their model's source code I'll agree.

replies(2): >>45147505 #>>45147720 #

14. 3836293648 ◴[06 Sep 25 08:10 UTC] No.45147505{3}[source]▶

>>45147476 #

And weights

replies(1): >>45147596 #

15. rvnx ◴[06 Sep 25 08:30 UTC] No.45147596{4}[source]▶

>>45147505 #

Isn’t it the mission of non-profit “Open”AI and Anthropic “Public Benefit Corporation”, right ?

16. mdp2021 ◴[06 Sep 25 08:56 UTC] No.45147703[source]▶

>>45145658 (TP) #

> this is obvious

I think it is obvious instead that readers employed by humans fit the principle.

> rights

Societally, it is more of a duty. Knowledge is made available because we must harness it.

17. mdp2021 ◴[06 Sep 25 09:00 UTC] No.45147720{3}[source]▶

>>45147476 #

That is an elision for "public knowledge". Of course there are nuances. In the case of books, there is little doubt: printed for sale is literally named "published".

(The "for sale" side does not limit the purpose to sales only, before somebody wants to attack that.)

replies(1): >>45148295 #

18. mdp2021 ◴[06 Sep 25 10:15 UTC] No.45148056{3}[source]▶

>>45146371 #

> for-profit

I presume you (people do) have exploited that knowledge that society has made in principle and largely practice freely accessible to build a professionality, which is now for-profit: you will charge parties for the skills that available knowledge has given you.

The "profit" part is not the problem.

19. LunaSea ◴[06 Sep 25 11:12 UTC] No.45148295{4}[source]▶

>>45147720 #

Books are private objects sold to buyers. By definition, its not public knowledge.

replies(1): >>45151368 #

20. aprilthird2021 ◴[06 Sep 25 16:19 UTC] No.45150582[source]▶

>>45145778 #

Scanning books for indexes is fair use. Very notably providing access to those books to the public for free was not fair use...

21. mdp2021 ◴[06 Sep 25 17:47 UTC] No.45151368{5}[source]▶

>>45148295 #

Again and again: the "book", the item, is a private object, access to the text is freely available - to those member of societies that have decided that knowledge be freely available and have thus established libraries. (They have collected the books - their own - so that we can freely access the texts.)

22. martin-t ◴[06 Sep 25 21:34 UTC] No.45153067{4}[source]▶

>>45147015 #

And no doubt you understand that this is the current state, not a stable equilibrium.

They'll either go out of business or make better models paid while providing only weaker models for free despite both being trained on the same data.

23. martin-t ◴[06 Sep 25 21:37 UTC] No.45153091[source]▶

>>45145778 #

> knowledge should be free

Knowledge costs money to gain/research.

Are you saying people who do the most valuable work of pushing the boundaries of human knowledge should not be fairly compensated for their work?

24. vkou ◴[06 Sep 25 21:42 UTC] No.45153137[source]▶

>>45145778 #

> scanning books for indexes is fair use.

An LLM isn't an index.

25. mdp2021 ◴[07 Sep 25 07:41 UTC] No.45156210{3}[source]▶

>>45145933 #

The right to access knowledge remains human oriented even when the reading is automated.

It does not matter that your screwdriver does not have rights: you will be using it for the purpose consistent with the principle of your freedom and encouragement to fix your cabling. You are not required to "hand-screw them drives".

In context, for example, you can take notes. That has nothing to do with the "rights of the paper".

Nothing forbids an automated reader by principle - especially when the automated reader is an intermediate tool for human operation.

↑