Anthropic judge rejects $1.5B AI copyright settlement

(news.bloomberglaw.com)

290 points nobody9999 | 1 comments | 09 Sep 25 08:46 UTC | HN request time: 0s | source

Show context

jawns ◴[09 Sep 25 19:02 UTC] No.45187038[source]▶

I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

replies(22): >>45187319 #>>45187366 #>>45187519 #>>45187839 #>>45188602 #>>45189683 #>>45189684 #>>45190184 #>>45190223 #>>45190237 #>>45190555 #>>45190731 #>>45191633 #>>45192016 #>>45192191 #>>45192348 #>>45192404 #>>45192630 #>>45193043 #>>45195516 #>>45201246 #>>45218895 #

visarga ◴[09 Sep 25 19:32 UTC] No.45187519[source]▶

>>45187038 #

How is it fair? Do you expect 9,000 from Google, Meta, OpenAI, and everyone else? Were your books imitated by AI?

Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?

replies(4): >>45187577 #>>45187677 #>>45187811 #>>45187853 #

gruez ◴[09 Sep 25 19:35 UTC] No.45187577[source]▶

>>45187519 #

>Were your books imitated by AI?

Given that books can be imitated by humans with no compensation, this isn't as strong as an argument as you think. Moreover AFAIK the training itself has been ruled legal, so Anthropic could have theoretically bought the book for $20 (or whatever) and be in the clear, which would obviously bring less revenue than the $9k settlement.

replies(2): >>45187621 #>>45188044 #

visarga ◴[09 Sep 25 19:39 UTC] No.45187621[source]▶

>>45187577 #

Copyright should be about copying rights, not statistical similarities. Similarity vs causal link - a different standard all together.

replies(3): >>45187751 #>>45187806 #>>45187851 #

Retric ◴[09 Sep 25 19:48 UTC] No.45187751[source]▶

>>45187621 #

The entire purpose of training materials is to copy aspects of them. That’s the causal link.

replies(2): >>45187830 #>>45193880 #

Dylan16807 ◴[09 Sep 25 19:53 UTC] No.45187830[source]▶

>>45187751 #

The aspect it's supposed to copy is the statistics of how words work.

And in general, when an LLM is able to recreate text that's a training error. Recreating text is not the purpose. Which is not to excuse it happening, but the distinction matters.

replies(1): >>45188144 #

program_whiz ◴[09 Sep 25 20:11 UTC] No.45188144[source]▶

>>45187830 #

In training, the model is trained to predict the exact sequence of words of a text. In other words, it is reproducing the text repeatedly for its own trainings. The by-product of this training is that it influences model weights to make the text more likely to be produced by the model -- that is its explicit goal. A perfect model would be able to reproduce the text perfectly (0 loss).

Real-world absurd example: A company hires a bunch of workers. They then give them access to millions of books and have the workers reading the books all day. The workers copy the books word by word, but after each word try to guess the next word that will appear. Eventually, they collectively become quite good at guessing the next word given a prompt text, even reproducing large swaths of text almost verbatim. The owner of company Y claims they owe nothing to the book owners, because it doesn't count as reading the book, and any reproduction is "coincidental" (even though this is the explicit task of the readers). They then use these workers to produce works to compete with the authors of the books, which they never paid for.

It seems many people feel this is "fair use" when it happens on a computer, but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style. If you feel this is still fair use, then you should agree all books should be free to everyone (as well as art, code, music, and any other training material).

replies(2): >>45188179 #>>45188571 #

gruez ◴[09 Sep 25 20:14 UTC] No.45188179[source]▶

>>45188144 #

>but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style

Can you provide an example of someone being successfully sued for "mimicking style", presumably in the US judicial system?

replies(3): >>45188472 #>>45188557 #>>45189798 #

1. Retric ◴[09 Sep 25 20:35 UTC] No.45188557{3}[source]▶

>>45188179 #

Style in an ambiguous term here as it doesn’t directly map to what’s being considered. The case between “Blurred Lines” and “Got to Give It Up” is often considered one of style and the Court of Appeals for the Ninth Circuit upheld copyright infringement.

However, AI has been show to copy a lot more than what people consider style.

↑