Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge

  First, Authors argue that using works to train Claude’s underlying LLMs 
  was like using works to train any person to read and write, so Authors 
  should be able to exclude Anthropic from this use (Opp. 16). 

  Second, to that last point, Authors further argue that the training was 
  intended to memorize their works’ creative elements — not just their 
  works’ non-protectable ones (Opp. 17).

  Third, Authors next argue that computers nonetheless should not be 
  allowed to do what people do.

https://media.npr.org/assets/artslife/arts/2025/order.pdf

replies(4): >>44492411 #>>44492758 #>>44492890 #>>44493381 #

codedokode ◴[07 Jul 25 17:07 UTC] No.44492411[source]▶

>>44492293 #

Computers cannot learn and are not subjects to laws. What happens, is a human takes a copyrighted work, makes an unauthorized digital copy, and loads it into a computer without authorization from copyright owner.

replies(2): >>44493047 #>>44493124 #

KoolKat23 ◴[07 Jul 25 18:03 UTC] No.44493047[source]▶

>>44492411 #

And they are not selling this or distributing this.

The model is very different.

replies(1): >>44494018 #

cmiles74 ◴[07 Jul 25 19:52 UTC] No.44494018[source]▶

>>44493047 #

I have to disagree, without all the copyrighted input data there would be no output data for these companies to sell. This output data is the product and they are distributing it for dollars.

replies(1): >>44494072 #

1. KoolKat23 ◴[07 Jul 25 19:59 UTC] No.44494072[source]▶

>>44494018 #

Copyright is concerned with the the actual physical copy. The model isn't this. The end user would have to carefully prompt the models algorithm to output a copyright infringing piece.

This argument is more along the lines of: blaming Microsoft Word for someone typing characters into the word processors algorithm, and outputting a copy of an existing book. (Yes, it is a lot easier, but the rationale is the same). In my mind the end user prompting the model would be the one potentially infringing.

replies(1): >>44494226 #

2. cmiles74 ◴[07 Jul 25 20:15 UTC] No.44494226[source]▶

>>44494072 (TP) #

FWIW, I don’t think there is a prompt that would reliably produce, verbatim, a copyrighted work.

I do think that a big part of the reason Anthropic downloaded millions of books from pirate torrents was because they needed that input data in order to generate the output, their product.

I don’t know what that is, but, IMHO, not sharing those dollars with the creators of the content is clearly wrong.

replies(1): >>44495306 #

3. ◴[07 Jul 25 22:38 UTC] No.44495306[source]▶

>>44494226 #

↑