Most active commenters
  • godelski(7)
  • terminalshort(6)
  • vidarh(5)
  • jimmydorry(3)

←back to thread

290 points nobody9999 | 40 comments | | HN request time: 0.001s | source | bottom
Show context
jawns ◴[] No.45187038[source]
I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

replies(22): >>45187319 #>>45187366 #>>45187519 #>>45187839 #>>45188602 #>>45189683 #>>45189684 #>>45190184 #>>45190223 #>>45190237 #>>45190555 #>>45190731 #>>45191633 #>>45192016 #>>45192191 #>>45192348 #>>45192404 #>>45192630 #>>45193043 #>>45195516 #>>45201246 #>>45218895 #
1. tartoran ◴[] No.45187839[source]
> I think that's fair, considering that two of those books received advances under $20K and never earned out.

It may be fair to you but how about other authors? Maybe it's not fair at all to them.

replies(2): >>45187873 #>>45189724 #
2. jawns ◴[] No.45187873[source]
Then they can opt out of the class.
replies(1): >>45188608 #
3. gowld ◴[] No.45188608[source]
Or the judge can reject the settlement as insufficient, which is what TFA is about.
replies(1): >>45189797 #
4. terminalshort ◴[] No.45189724[source]
Do they sell their books for more than $3000 per copy? In that case it isn't fair. Otherwise they are getting a windfall because of Anthropic's stupidity in not buying the books.
replies(5): >>45189898 #>>45190191 #>>45190448 #>>45192764 #>>45196449 #
5. NoahZuniga ◴[] No.45189797{3}[source]
That doesn't seem why the judge rejected the settlement. To me it seems like there judge thought that the details weren't worked out enough to tell if its reasonable.
6. paulryanrogers ◴[] No.45189898[source]
Some judgements are punitive, to deter future abuse. Otherwise why pay for anything when you can just always steal and pay only what's owed whenever you're caught?
replies(2): >>45190072 #>>45193843 #
7. terminalshort ◴[] No.45190072{3}[source]
Yes, in this particular case the damages are statutory, which means they are specifically punitive and not in compensation to the author. This is why it is definitely not unfair to the author. It is a lucky win for them.
replies(1): >>45190232 #
8. godelski ◴[] No.45190191[source]

  | Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.[0]
Please don't be disingenuous. You know that none of the authors were selling their books for $3k a piece, so obviously this is about something more

  > because of Anthropic's stupidity in not buying the books.
And what about OpenAI, who did the same thing?

What about Meta, who did the same thing?

What about Google, who did the same thing?

What about Nvidia, who did the same thing?

Clearly something should be done because it's not like these companies can't afford the cost of the books. I mean Meta recently hired people giving out >$100m packages and bought a data company for $15bn. Do you think they can't afford to buy the books, videos, or even the porn? We're talking about trillion dollar companies.

It's been what, a year since Eric Schmidt said to steal everything and let the lawyers figure it out if you become successful?[1] Personal I'm not a big fan of "the ends justify the means" arguments. It's led to a lot of unrest, theft, wars, and death.

Do you really not think it's possible to make useful products ethically?

[0] https://news.ycombinator.com/newsguidelines.html

[1] https://www.theverge.com/2024/8/14/24220658/google-eric-schm...

replies(3): >>45190454 #>>45190829 #>>45191515 #
9. godelski ◴[] No.45190232{4}[source]
I think you are using a naïve model. You're making the comparison based on "price of book" vs "compensation". Do you think thats all the costs here? Who knows about OP, but I'm willing to bet many of those authors taught legal council, which costs money. Opportunity costs are also difficult to measure. Same with lost future incomes.

I don't think $3k is likely a bad deal, but I still think you're over simplifying things.

replies(1): >>45190865 #
10. giveita ◴[] No.45190448[source]
If I copy your book and sell a million bootleg copies that compete directly with your book is that worth the $30 cover price?

This is what generative AI essentially is.

Maybe the payment should be $500/h (say $5k a page) to cover the cost of preparing a human verified dataset for anthropic.

replies(4): >>45190525 #>>45190663 #>>45191221 #>>45191261 #
11. janalsncm ◴[] No.45190454{3}[source]
This isn’t a deal to sell their books. The authors are getting $3k per book while maintaining the rights to their IP. The settlement is to avoid statutory damages which are between $750 and $30k or more per infringement.

One of the consequences of retaining their rights is that they can also sue Meta and Google and OpenAI etc for the same thing.

replies(1): >>45190500 #
12. godelski ◴[] No.45190500{4}[source]
I think we are in agreement[0]. I was just focusing on a different part

[0] https://news.ycombinator.com/item?id=45190232

13. aeon_ai ◴[] No.45190525{3}[source]
It’s been determined that training is fair use by the same judge - Anthropic did in fact buy copies of books and train on those as well.

Thus the $3k per violation is still punitive at (conservatively) 100x the cost of the book.

Given that it is fair use, Authors do not have rights to restrict training on their works under copyright law alone.

14. terminalshort ◴[] No.45190663{3}[source]
In that case the damages would be $3000 per copy you sold. Distributing copyrighted work is an entirely different category of offense than just simply downloading and consuming. Anthropic didn't distribute any copies, so the damages are limited to the one copy they pirated. That is not remotely what generative AI is, and it's why the judge ruled that it was perfectly legal to feed the books to the model.
15. terminalshort ◴[] No.45190829{3}[source]
Where is your evidence that Meta, Google, and OpenAI did the same thing? (As for NVIDIA, do they even train models?) Because if they did, why haven't they been sued? This is a garden variety copyright infringement case and would be a slam dunk win for the plaintiffs. The only novel part of the case is the claim that the plaintiffs lost on, which establishes president that training an LLM is fair use.

> Clearly something should be done because it's not like these companies can't afford the cost of the books

Yes indeed it should, and it has. They have been forced to pay $3000 per book they pirated, which is more than 100x what they would have gained if they had gotten away with it.

IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy. If you want to argue that the penalty should be more, you can do that, but it is completely missing my point. You are talking about what is fair punishment to the companies, and my comment was talking about what is fair compensation to the authors. Those are two completely different things.

replies(3): >>45193777 #>>45195142 #>>45195829 #
16. terminalshort ◴[] No.45190865{5}[source]
This is a class action suit, so the legal fees are almost certainly being paid on contingency and not out of pocket. And there is no opportunity cost or lost future income here because this is piracy not theft. The authors were never deprived of any ability to continue to sell their work through normal channels. They only lost the revenue from the sale of a single copy.
replies(2): >>45193409 #>>45194532 #
17. megaman821 ◴[] No.45191221{3}[source]
I am not sure what types of books you read, but AI has replaced absolutely no books for me.
18. II2II ◴[] No.45191261{3}[source]
The thing is: you aren't distributing copies with generative AI, in any sensible meaning of the word.

Don't get me wrong: I think this is in incredibly bad deal for authors. That said, I would be horrified if it wasn't treated as fair use. It would be incredibly destructive to society since people would try to use such rulings to chissel away at fair use. Imagine schools who had to pay yearly fees to use books. We know they would do that, they already try to do so (single use workbooks, online value added services). Or look at software. It is already going to be problematic for people who use LLMs. It is already problematic due to patents. Now imagine what would happen if reformulating algorithms that you read in a book was not considered as fair use. Or look at books themselves. A huge chunk of non-fiction consists of doing research and re-expressing ideas in non-original terms. Is that fair use? The main difference between that and a generative AI is we can say a machine did it in the case of generative AI, but is that enough to protect fair use in the conventional sense?

replies(2): >>45192178 #>>45192814 #
19. kelnos ◴[] No.45191515{3}[source]
> And what about $OTHER_AI_COMPANY, who did the same thing?

If there's evidence of this that will stand up in court, they should be sued as well, and they'll presumably lose. If this hasn't happened, or isn't in the works, then I guess they covered their tracks well enough. That's unfortunate, but that's life.

replies(1): >>45193795 #
20. rkagerer ◴[] No.45192178{4}[source]
Imagine schools who had to pay yearly fees to use books

I feel like we aren't far from that. Wouldn't be surprised if new books get published (in whatever medium) that are licensed out instead of sold.

replies(1): >>45197323 #
21. ◴[] No.45192764[source]
22. giveita ◴[] No.45192814{4}[source]
This is parallel to mass surveillance. Surveillance is OK (private eye) so dragnetting is also OK as it is just scaled up private detectives. If 1 is OK then 1+1 is OK. And so is by peano, a Googolplex of the OK thing.
replies(1): >>45201095 #
23. godelski ◴[] No.45193409{6}[source]

  > the legal fees are almost certainly being paid on contingency and not out of pocket.
The legal fees for this lawsuit. Not the legal feels for anyone who went and talked to a lawyer suspecting their material was illegitimately used.

You're treating the system as isolated when it is not.

  > no opportunity cost or lost future income here because this is piracy not theft.
I think you are confused. Yes, it is piracy but not like the typical piracy most of us do. There's no loss in pirating a movie if you would never have paid to see the movie in the first place.

But there's future costs here as people will use LLMs to generate books, which is competition. The cost of generating such a book is much cheaper, allowing for a much cheaper product.

  > They only lost the revenue from the sale of a single copy.
In your effort to simplify things you have only complicated them.
replies(1): >>45193525 #
24. terminalshort ◴[] No.45193525{7}[source]
You are not entitled to protection from future competition, only from loss of sales of your current work. You are not ever entitled to legal fees you pay if you don't file a suit.
replies(1): >>45193907 #
25. godelski ◴[] No.45193777{4}[source]
I mean you can Google these... They also have been popping up on HN for the last year, it is even referenced in the article, and there's even another post in the sidebar titled "Anthropic Record AI Copyright Pact Sets Bar for OpenAI, Meta"[0], so I really didn't feel it was necessary to provide links. But sure, if you're feeling lazy, I got your back. I'll even limit it to HN posts so you don't have to even leave the site

  Torrenting:
  Meta Pirating Books[1,2,3]
    - [1] Fun fact, [1] is the most popular post of all time on HN for the search word "torrent" and the 5th ranking for "Meta". [2] is the 16th for "illegal"
  Nvidia [4,5]
  Apple, Nvidia, Anthropic[6]
  GitHub [7,8]
  OpenAI [9,10]
  Google [11]
    - I mean this one was even mentioned in the articled from the Anthropic post from a few days ago[12]
I hope that's sufficient. You can find plenty more if you do a good old fashion search instead of just using the HN search. But most of these were pretty high profile stories so was pretty quick to look.

  > which establishes president that training an LLM is fair use.
                      ~~~~~~~~~
                      precedent
I think you misunderstand. The precedent is over the issue of piracy. This has not made precedence over the issue of fair use. There is ongoing litigation, but there was precedence set in another lawsuit with Meta[13], which is currently going through appeals. I'll give you a head start on that one [14,15]. But the issue of fair use is still being debated. These things take years and I don't think anyone will be surprised when this stuff lands in some of the highest courts and gets revisited in a different administration.

  > IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy.
Sure. You can have whatever opinion you want. I wasn't arguing about your opinion. I even agreed with it[16]!

But that is a different topic all together. I still think you've vastly over simplified the conversation and thus unintentionally making some naive assumptions. It's the whole reason I said "probably" in [16]. The big difference being just that you're smart enough to figure out how law works and I'm smart enough to know that neither of us are lawyers.

And please don't ask me for more citations unless they are difficult to Google... I think I already set some kinda record here...

  [0] https://archive.is/3oCg8
  [1] https://news.ycombinator.com/item?id=42971446
  [2] https://news.ycombinator.com/item?id=43125840
  [3] https://news.ycombinator.com/item?id=42772771
  [4] https://news.ycombinator.com/item?id=40505480
  [5] https://news.ycombinator.com/item?id=41163032
  [6] https://news.ycombinator.com/item?id=40987971
  [7] https://news.ycombinator.com/item?id=33457063
  [8] https://news.ycombinator.com/item?id=27724042
  [9] https://news.ycombinator.com/item?id=42273817
  [10] https://news.ycombinator.com/item?id=38781941
  [11] https://news.ycombinator.com/item?id=11520633
  [12] https://news.ycombinator.com/item?id=45142885
  [13] https://perkinscoie.com/insights/update/court-sides-meta-fair-use-and-dmca-questions-leaves-door-open-future-challenges
  [14] https://arstechnica.com/tech-policy/2025/07/meta-pirated-and-seeded-porn-for-years-to-train-ai-lawsuit-says/
  [15] https://torrentfreak.com/copyright-lawsuit-accuses-meta-of-pirating-adult-films-for-ai-training/
  [16] https://news.ycombinator.com/item?id=45190232
26. godelski ◴[] No.45193795{4}[source]
I mean they are being sued? I provided a long list of HN links in the sibling comment. But you know... you can also check Google[0]

[0] https://gprivate.com/6ib6y

27. f33d5173 ◴[] No.45193843{3}[source]
Supposing a book is usually $30, then this would be a factor of 100 above that. That seems fairly punitive to me.
28. godelski ◴[] No.45193907{8}[source]

  > You are not entitled to protection from future competition
What do you think patents, copyright, trademarks, and all this other stuff is even about?

There's "Statutory Damages" which account for a wide range of things[0].

Not to mention you just completely ignored what I argued!

Seriously, you've been making a lot of very confident claims in this thread and they are easy to verify as false. Just google some of your assumptions before you respond. Hell, ask an LLM and they'll tell you! Just don't make assumptions and do zero amount of vetting. It's okay to be wrong, but you're way off base buddy.

[0] https://en.wikipedia.org/wiki/Statutory_damages

replies(1): >>45195125 #
29. iamsaitam ◴[] No.45194532{6}[source]
"The authors were never deprived of any ability to continue to sell their work through normal channels" this isn't exactly true is it? If the "AI" used their books for training, then it's able to provide information/value/content from them, lowering the incentive for people to buy these books.
replies(1): >>45195133 #
30. vidarh ◴[] No.45195125{9}[source]
Copyright doesn't protect you from "future competition" in the sense meant here of competition from other works.
replies(1): >>45195784 #
31. vidarh ◴[] No.45195133{7}[source]
However, the judge does not appear to believe they have any legal right to protection from that in this case. The settlement is over their use of pirated copies instead of buying one copy of each of the works in question.
replies(1): >>45195804 #
32. vidarh ◴[] No.45195142{4}[source]
> As for NVIDIA, do they even train models?

Yes. Nemotron:

https://www.nvidia.com/en-gb/ai-data-science/foundation-mode...

33. jimmydorry ◴[] No.45195784{10}[source]
Copyright protects you from market substitutions (e.g. someone taking your IP, and offering an alternative to your work). Being trained on your IP, it could certainly be argued that users would no longer need to purchase your book.

"Future competition" is a loosely worded way of saying this.

replies(1): >>45196180 #
34. jimmydorry ◴[] No.45195804{8}[source]
I haven't read this particular case, but typically judges will keep the judgement as narrow as possible... so it may entirely be the case that these IP owners or in similar future cases may also have legal right to protection from it.
replies(1): >>45196110 #
35. jimmydorry ◴[] No.45195829{4}[source]
> IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy.

Anti-piracy groups use scare letters on pirates where they threaten to sue for tens of thousands of dollars per instance of piracy. Why should it be lower for a company?

36. vidarh ◴[] No.45196110{9}[source]
The judge has already ruled that using books to train AI does not in itself violate US copyright law, and so the surviving claims from plaintffs were relating to Anthropic pirating books.
37. vidarh ◴[] No.45196180{11}[source]
"It could be argued" but the judge in this case has already ruled that the training does not violate copyright. Market substitution only comes into play to determine fair use if copyright has already been infringed.
38. seanhunter ◴[] No.45196449[source]
If you read the copyright text on the back of the title page of a book, buying it doesn’t give you the right to “mechanically reproduce” the book. I would be very surprised if there was a court ruling that didn’t either A)completely strike that notice and say it’s fair game to photocopy or scan books you have bought for any purpose (which is not what courts have held in the past, so it would be a big shift) or B)uphold it and say it also applies to scraping the content of a book for training.

…especially given the US “fair use” doctrine takes into account the effect that a particular use might have on the market for similar works, so the authors are bound to argue that the existence of AI that can reproduce fanfiction-like facsimiles of works at scale is going to poison the well and reduce the market for people spending actual money on future works (whether or not that’s true is another question).

So in my view the court is going to say that buying a book doesn’t give them the right to train on the contents because that is mechanical reproduction which is explicitly disallowed by the copyright notice and they don’t fall under the “fair use” carveout because they affect the future market. There isn’t anywhere else where they were granted the right to use the authors’ works so the work is disallowed. Obviously no court finding is ever 100% guaranteed but that really seems the only logically-consistent conclusion they could come to.

39. II2II ◴[] No.45197323{5}[source]
You just reminded me that it is a thing, or at least was a thing. Around the time that I was leaving the university world publishers were were starting to introduce time limited ebooks. This not only affected the second hand book market, but students who would have their books in the past.
40. rangestransform ◴[] No.45201095{5}[source]
This is unironically what happened with Katz vs. USA wrt. expectation of privacy in public