←back to thread

290 points nobody9999 | 2 comments | | HN request time: 0.001s | source
Show context
jawns ◴[] No.45187038[source]
I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

replies(22): >>45187319 #>>45187366 #>>45187519 #>>45187839 #>>45188602 #>>45189683 #>>45189684 #>>45190184 #>>45190223 #>>45190237 #>>45190555 #>>45190731 #>>45191633 #>>45192016 #>>45192191 #>>45192348 #>>45192404 #>>45192630 #>>45193043 #>>45195516 #>>45201246 #>>45218895 #
visarga ◴[] No.45187519[source]
How is it fair? Do you expect 9,000 from Google, Meta, OpenAI, and everyone else? Were your books imitated by AI?

Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?

replies(4): >>45187577 #>>45187677 #>>45187811 #>>45187853 #
jawns ◴[] No.45187853[source]
You've misunderstood the case.

The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.

The suit is about Anthropic procuring those materials from a pirated dataset.

The infringement, in other words, happened at the time of procurement, not at the time of training.

If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.

replies(3): >>45188132 #>>45188418 #>>45190047 #
mmargenot ◴[] No.45188132[source]
Do foundation model companies need to license these books or simply purchase them going forward?
replies(2): >>45188266 #>>45188299 #
sharkjacobs ◴[] No.45188299[source]
> On June 23, 2025, the Court rendered its Order on Fair Use, Dkt. 231, granting Anthropic’s motion for summary judgment in part and denying its motion in part. The Court reached different conclusions regarding different sources of training data. It found that reproducing purchased and scanned books to train AI constituted fair use. Id. at 13-14, 30–31. However, the Court denied summary judgment on the copyright infringement claims related to the works Anthropic obtained from Library Genesis and Pirate Library Mirror. Id. at 19, 31.

https://www.documentcloud.org/documents/26084996-proposed-an...

> reproducing purchased and scanned books to train AI constituted fair use

replies(2): >>45188384 #>>45190162 #
thaumasiotes ◴[] No.45188384[source]
The usual analysis was that when you download a book from Library Genesis, that is an instance of copyright infringement committed by Library Genesis. This ruling appears to reverse that analysis.
replies(1): >>45189412 #
papercrane ◴[] No.45189412[source]
Do you have a source for that because MAI Systems Corp. v. Peak Computer, Inc established that even creating a copy in RAM is considered a "copy" under the Copyright Act and can be infringement.
replies(1): >>45189579 #
parineum ◴[] No.45189579[source]
It's not an issue of where it's being copied, it's who's doing the copying.

Library Genesis has one copy. It then sends you one copy and keeps it's own. The entity that violated the _copy_right is the one that copied it, not the one with the copy.

replies(1): >>45189881 #
masfuerte ◴[] No.45189881[source]
There are many copies made as the text travels from Library Genesis to Anthropic. This isn't just of theoretical interest. English law has specific copyright exemptions for transient copies made by internet routers, etc. It doesn't have exemptions for the transient copies made by end users such as Anthropic, and they are definitely infringing.

Of course, American law is different. But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

replies(1): >>45190570 #
1. thaumasiotes ◴[] No.45190570[source]
> But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

Well, the question here is "who made the copy?"

If you advertise in seedy locations that you will send Xeroxed copies of books by mail order, and I order one, and you then send me the copy I ordered, how many of us have committed a copyright violation?

replies(1): >>45194457 #
2. masfuerte ◴[] No.45194457[source]
Copyright law is literally about the copies. A xeroxed book is exactly one copy. Mailing and reading that book doesn't copy it any further. In contrast, you can't do anything with digital media without making another copy.

> "Who made the copy?"

This begs the question. With digital media everybody involved makes multiple copies.