Most active commenters
  • greensoap(3)

←back to thread

290 points nobody9999 | 24 comments | | HN request time: 0.001s | source | bottom
Show context
jawns ◴[] No.45187038[source]
I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

replies(22): >>45187319 #>>45187366 #>>45187519 #>>45187839 #>>45188602 #>>45189683 #>>45189684 #>>45190184 #>>45190223 #>>45190237 #>>45190555 #>>45190731 #>>45191633 #>>45192016 #>>45192191 #>>45192348 #>>45192404 #>>45192630 #>>45193043 #>>45195516 #>>45201246 #>>45218895 #
visarga ◴[] No.45187519[source]
How is it fair? Do you expect 9,000 from Google, Meta, OpenAI, and everyone else? Were your books imitated by AI?

Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?

replies(4): >>45187577 #>>45187677 #>>45187811 #>>45187853 #
1. jawns ◴[] No.45187853[source]
You've misunderstood the case.

The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.

The suit is about Anthropic procuring those materials from a pirated dataset.

The infringement, in other words, happened at the time of procurement, not at the time of training.

If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.

replies(3): >>45188132 #>>45188418 #>>45190047 #
2. mmargenot ◴[] No.45188132[source]
Do foundation model companies need to license these books or simply purchase them going forward?
replies(2): >>45188266 #>>45188299 #
3. bhickey ◴[] No.45188266[source]
Probably the latter.
4. sharkjacobs ◴[] No.45188299[source]
> On June 23, 2025, the Court rendered its Order on Fair Use, Dkt. 231, granting Anthropic’s motion for summary judgment in part and denying its motion in part. The Court reached different conclusions regarding different sources of training data. It found that reproducing purchased and scanned books to train AI constituted fair use. Id. at 13-14, 30–31. However, the Court denied summary judgment on the copyright infringement claims related to the works Anthropic obtained from Library Genesis and Pirate Library Mirror. Id. at 19, 31.

https://www.documentcloud.org/documents/26084996-proposed-an...

> reproducing purchased and scanned books to train AI constituted fair use

replies(2): >>45188384 #>>45190162 #
5. thaumasiotes ◴[] No.45188384{3}[source]
The usual analysis was that when you download a book from Library Genesis, that is an instance of copyright infringement committed by Library Genesis. This ruling appears to reverse that analysis.
replies(1): >>45189412 #
6. gowld ◴[] No.45188418[source]
I thought that distribution of copyrighted materials was legally encumbered, not reception thereof.
replies(3): >>45188545 #>>45188918 #>>45189923 #
7. lawlessone ◴[] No.45188545[source]
Did they use a torrent? If they used a torrent isn't it likely they distributed it while downloading it?
replies(1): >>45189345 #
8. thayne ◴[] No.45188918[source]
Do you have a source for that? My understanding was that both were illegal, although of course media companies have an interest in making people believe that even if it isn't true.
9. gkbrk ◴[] No.45189345{3}[source]
Surely a state-of-the-art tech company would know how to disable seeding.
replies(1): >>45189599 #
10. papercrane ◴[] No.45189412{4}[source]
Do you have a source for that because MAI Systems Corp. v. Peak Computer, Inc established that even creating a copy in RAM is considered a "copy" under the Copyright Act and can be infringement.
replies(1): >>45189579 #
11. parineum ◴[] No.45189579{5}[source]
It's not an issue of where it's being copied, it's who's doing the copying.

Library Genesis has one copy. It then sends you one copy and keeps it's own. The entity that violated the _copy_right is the one that copied it, not the one with the copy.

replies(1): >>45189881 #
12. LeoPanthera ◴[] No.45189599{4}[source]
BitTorrent clients will not send data to clients which aren't uploading, as far as I know.
13. masfuerte ◴[] No.45189881{6}[source]
There are many copies made as the text travels from Library Genesis to Anthropic. This isn't just of theoretical interest. English law has specific copyright exemptions for transient copies made by internet routers, etc. It doesn't have exemptions for the transient copies made by end users such as Anthropic, and they are definitely infringing.

Of course, American law is different. But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

replies(1): >>45190570 #
14. adrr ◴[] No.45189923[source]
Downloading is making a copy and covered by copyright law. Its also covered by statutory damages clause of up to $150k per violation if willful. I assume Anthropic knew they were using pirated books.
15. greensoap ◴[] No.45190047[source]
A point of clarifications and some questions.

The portion the court said was bad was not Anthropic getting books from pirated sites to train its model. The court opined that training the model was fair use and did not distinguish between getting the books from pirated sites or hard copy scans. The part the court said was bad, which was settled, was Anthropic getting books from a pirate site to store in a general purpose library.

--

  "To summarize the analysis that now follows, the use of the books at issue to train Claude
  and its precursors was exceedingly transformative and was a fair use under Section 107 of the
  Copyright Act. And, the digitization of the books purchased in print form by Anthropic was. 
  also a fair use but not for the same reason as applies to the training copies. Instead, it was a
  fair use because all Anthropic did was replace the print copies it had purchased for its central
  library with more convenient space-saving and searchable digital copies for its central
  library — without adding new copies, creating new works, or redistributing existing copies.
  However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
  permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy."

  "Because the legal issues differ between the *library copies* Anthropic purchased and
  pirated, this order takes them in turn."

--

Questions

As an author do you think it matters where the book was copied from? Presumably, a copyright gives the author the right to control when a text is reproduced and distributed. If the AI company buys a book and scans it, they are reproducing the book without a license, correct? And fair use is the argument that even though they violated the copyright, they are execused. In a pure sense, if the AI company copied (assuming they didn't torrent back the book) from a "pirate source" why is that copy worse then if they copied from a hard book?

replies(2): >>45190149 #>>45190256 #
16. 8note ◴[] No.45190149[source]
> AI company buys a book and scans it, they are reproducing the book without a license, correct

isn't digitizing your own copies as backups and personal use fine? so long as you dont give away the original while keeping the backups. similarly, dont give away the digital copies.

replies(1): >>45190932 #
17. greensoap ◴[] No.45190162{3}[source]
Actually, the court really only said downloading a pirated book to store in your "library" was bad. The opinion is intentionally? ambiguous on whether the decision regarding copies used to train an LLM applies only to scanned books or also to pirated books. The facts found in the case are the training datasets were made from the "library" copies of books that included scans and pirated downloads. And the court said the training copies were fair use. The court also said the scanned library copies were fair use. The court found that the pirated library copies was not fair use. The court did not say for certain whether the pirated training copies were fair use.
18. cortesoft ◴[] No.45190256[source]
> If the AI company buys a book and scans it, they are reproducing the book without a license, correct?

No? I think there are a lot more details that need to be known before answering this question. It matters what they do with it after they scan it.

replies(1): >>45190417 #
19. greensoap ◴[] No.45190417{3}[source]
That is only relevant to whether it is fair use not to whether the copying is an infringement. Fair use is what is called an affirmative defense -- it means that yes what I did was technically a violation but is forgiven. So on technicalities the copying is an infringement but that infringement is "okay" because there is a fair use. A different scenario is if the copyright owner gives you a license to copy the work (like open source licenses). In that scenario the copying was not an infringement because a license exists.
replies(2): >>45190558 #>>45193959 #
20. cortesoft ◴[] No.45190558{4}[source]
If I have have an app on my phone that lets me point my phone at a page to scan, OCR, and read the page out loud to me, it wouldn't even require fair use, would it?
21. thaumasiotes ◴[] No.45190570{7}[source]
> But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

Well, the question here is "who made the copy?"

If you advertise in seedy locations that you will send Xeroxed copies of books by mail order, and I order one, and you then send me the copy I ordered, how many of us have committed a copyright violation?

replies(1): >>45194457 #
22. esrauch ◴[] No.45190932{3}[source]
It is, Google Books did it over a decade ago (bought up physical books and scanned them all). There were some rulings about how much of a snippet they were allowed to show end users as fair use, but I'm fairly sure the actual scanning and indexing of the books was always allowed.
23. gpm ◴[] No.45193959{4}[source]
> Fair use is what is called an affirmative defense

Yes

> it means that yes what I did was technically a violation but is forgiven

Not at all. All "affirmative defence" means is that procedural the burden is on me to establish that I was not violating the law. The law isn't "you can't do the thing", rather it is "you can't do the thing unless its like this". There is no violation, there is no forgiveness as there is nothing to forgive, because it was done "like this" and doing it "like this" doesn't violate the law in the first place.

24. masfuerte ◴[] No.45194457{8}[source]
Copyright law is literally about the copies. A xeroxed book is exactly one copy. Mailing and reading that book doesn't copy it any further. In contrast, you can't do anything with digital media without making another copy.

> "Who made the copy?"

This begs the question. With digital media everybody involved makes multiple copies.