←back to thread

290 points nobody9999 | 1 comments | | HN request time: 0s | source
Show context
jawns ◴[] No.45187038[source]
I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

replies(22): >>45187319 #>>45187366 #>>45187519 #>>45187839 #>>45188602 #>>45189683 #>>45189684 #>>45190184 #>>45190223 #>>45190237 #>>45190555 #>>45190731 #>>45191633 #>>45192016 #>>45192191 #>>45192348 #>>45192404 #>>45192630 #>>45193043 #>>45195516 #>>45201246 #>>45218895 #
visarga ◴[] No.45187519[source]
How is it fair? Do you expect 9,000 from Google, Meta, OpenAI, and everyone else? Were your books imitated by AI?

Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?

replies(4): >>45187577 #>>45187677 #>>45187811 #>>45187853 #
jawns ◴[] No.45187853[source]
You've misunderstood the case.

The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.

The suit is about Anthropic procuring those materials from a pirated dataset.

The infringement, in other words, happened at the time of procurement, not at the time of training.

If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.

replies(3): >>45188132 #>>45188418 #>>45190047 #
mmargenot ◴[] No.45188132[source]
Do foundation model companies need to license these books or simply purchase them going forward?
replies(2): >>45188266 #>>45188299 #
sharkjacobs ◴[] No.45188299[source]
> On June 23, 2025, the Court rendered its Order on Fair Use, Dkt. 231, granting Anthropic’s motion for summary judgment in part and denying its motion in part. The Court reached different conclusions regarding different sources of training data. It found that reproducing purchased and scanned books to train AI constituted fair use. Id. at 13-14, 30–31. However, the Court denied summary judgment on the copyright infringement claims related to the works Anthropic obtained from Library Genesis and Pirate Library Mirror. Id. at 19, 31.

https://www.documentcloud.org/documents/26084996-proposed-an...

> reproducing purchased and scanned books to train AI constituted fair use

replies(2): >>45188384 #>>45190162 #
1. greensoap ◴[] No.45190162{5}[source]
Actually, the court really only said downloading a pirated book to store in your "library" was bad. The opinion is intentionally? ambiguous on whether the decision regarding copies used to train an LLM applies only to scanned books or also to pirated books. The facts found in the case are the training datasets were made from the "library" copies of books that included scans and pirated downloads. And the court said the training copies were fair use. The court also said the scanned library copies were fair use. The court found that the pirated library copies was not fair use. The court did not say for certain whether the pirated training copies were fair use.