Most active commenters
  • suyjuris(4)
  • pyman(3)

←back to thread

397 points pyman | 51 comments | | HN request time: 1.948s | source | bottom
1. ramon156 ◴[] No.44488798[source]
Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books. I'm not saying this is justified, but what would you have done in their situation?

Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?

replies(11): >>44488878 #>>44488900 #>>44488933 #>>44489076 #>>44489255 #>>44489312 #>>44489833 #>>44490433 #>>44491603 #>>44491921 #>>44493173 #
2. TimorousBestie ◴[] No.44488878[source]
150K per work is the maximum fine for willful infringement (which this is).

105B+ is more than Anthropic is worth on paper.

Of course they’re not going to be charged to the fullest extent of the law, they’re not a teenager running Napster in the early 2000s.

replies(3): >>44490409 #>>44493214 #>>44493244 #
3. pyman ◴[] No.44488900[source]
The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.

I'm against Anthropic stealing teacher's work and discouraging them from ever writing again. Some teachers are already saying this (though probably not in California).

replies(6): >>44489126 #>>44489222 #>>44489284 #>>44490693 #>>44491995 #>>44492961 #
4. glimshe ◴[] No.44488933[source]
Isn't "pirating" a felony with jail time, though? That's what I remember from the FBI warning I had to see at the beginning of every DVD I bought (but not "pirated" ones).
replies(2): >>44489024 #>>44490441 #
5. kevingadd ◴[] No.44489076[source]
Google did it the legal way with Google Books, didn't they?
replies(1): >>44489238 #
6. lofaszvanitt ◴[] No.44489126[source]
They won't be needed anymore, once singularity is reached. This might be their thought process. This also exemplifies that the loathed caste system found in India is indeed in place in western societies.

There is no equality, and seemingly there are worker bees who can be exploited, and there are privileged ones, and of course there are the queens.

replies(2): >>44489188 #>>44490686 #
7. pyman ◴[] No.44489188{3}[source]
:D

Note: My definition of singularity isn't the one they use in San Francisco. It's the moment founders who stole the life's work of thousands of teachers finally go to prison, and their datacentres get seized.

replies(1): >>44489324 #
8. CuriouslyC ◴[] No.44489222[source]
If you care so little about writing that AI puts you off it, TBH you're probably not a great writer anyhow.

Writers that have an authentic human voice and help people think about things in a new way will be fine for a while yet.

replies(1): >>44490115 #
9. maeln ◴[] No.44489255[source]
If you wanted to be legit with 0 chance of going to court, you would contact publisher and ask to pay a license to get access to their catalog for training, and negotiate from that point.

This is what every company using media are doing (think Spotify, Netflix, but also journal, ad agency, ...). I don't know why people in HN are giving a pass to AI company for this kind of behavior.

replies(3): >>44489478 #>>44490838 #>>44493519 #
10. glimshe ◴[] No.44489284[source]
That will be sad, although there will still be plenty of great people who will write books anyway.

When it comes to a lot of these teachers, I'll say, copyright work hand in hand with college and school course book mandates. I've seen plenty of teachers making crazy money off students' backs due to these mandates.

A lot of the content taught in undergrad and school hasn't changed in decades or even centuries. I think we have all the books we'll ever need in certain subjects already, but copyright keeps enriching people who write new versions of these.

11. suyjuris ◴[] No.44489312[source]
Just downloading them is of course cheaper, but it is worth pointing out that, as the article states, they did also buy legitimate copies of millions of books. (This includes all the books involved in the lawsuit.) Based on the judgement itself, Anthropic appears to train only on the books legitimately acquired. Used books are quite cheap, after all, and can be bought in bulk.
replies(1): >>44491385 #
12. lofaszvanitt ◴[] No.44489324{4}[source]
You can bet that this never gonna happen...
replies(1): >>44490868 #
13. suyjuris ◴[] No.44489423{3}[source]
The judge appears to disagree with you on this. They found that training and selling an LLM are fair use, based on the fact that it is exceedingly transformative, and that the copyright holders are not entitled to any profits thereof due to copyright. (They also did get paid — Anthropic acquired millions of books legally, including all of the authors in this complaint. This would not retroactively absolve them of legal fault for past infringements, of course.)
replies(2): >>44489534 #>>44490502 #
14. pyman ◴[] No.44489534{4}[source]
The trial is scheduled for December 2025. That's when a jury will decide how much Anthropic owes for copying and storing over seven million pirated books
replies(1): >>44489979 #
15. darkoob12 ◴[] No.44489833[source]
This is not about paying for a single copy. It would still be wrong even if they have bought every single one of those books. It is a form of plagiarism. The model will use someone else's idea without proper attribution.
replies(1): >>44490963 #
16. suyjuris ◴[] No.44489979{5}[source]
Yes, that would by an interesting trial. But it is only about six books, and all claims regarding Claude have been dismissed already. So only the internal copies remain, and there the theory for them being infringing is somewhat convoluted: you have to argue that they are not just for purposes of training (which was ruled fair use), and award damages even though these other purposes never materialised (since by now, they have legal copies of those books). I can see it, but I would not count on there being a trial.
17. 4b11b4 ◴[] No.44490115{3}[source]
Yeah, people will still want to write. They might need new ways to monetize it... that being said, even if people still want to write they may not consider it a viable path. Again, have to consider other monetization.
18. mystified5016 ◴[] No.44490279{3}[source]
No, it isn't.
19. voxic11 ◴[] No.44490409[source]
Even if they don't qualify for willful infringement damages (lets say they have a good faith belief their infringement was covered by fair use) the standard statutory damages for copyright infringement are $750-$30,000 per work.
20. tmaly ◴[] No.44490433[source]
At minimum they should have to buy the book they are deriving weights from.
replies(1): >>44491990 #
21. voxic11 ◴[] No.44490441[source]
Yes criminal copyright infringement (willful copyright infringement done for commercial gain or at a large scale) is a felony.
22. flaptrap ◴[] No.44490502{4}[source]
The fallacy in the 'fair use' logic is that a person acquires a book and learns from it, but a machine incorporates the text. Copyright does not allow one to create a derivative work without permission. Only when the result of the transformation resembles the original work could it be said that it is subject to copyright. Do not regard either of those legal issues are set in concrete yet.
replies(1): >>44490708 #
23. dmix ◴[] No.44490531{3}[source]
A court just ruled on Anthropic and said an LLM response wasn't a form of counterfeiting (ie, essentially selling pirate books on the black market). Although tbf that is the most radical interpretation still being put forward by the lawyers of publishers like NYTimes, despite the obvious flaws.
24. SketchySeaBeast ◴[] No.44490686{3}[source]
> They won't be needed anymore, once singularity is reached.

And it just so happens that that belief says they can burn whatever they want down because something in the future might happen that absolves them of those crimes.

25. mensetmanusman ◴[] No.44490708{5}[source]
Both a human and a machine learn from it. You can design an LLM that doesn’t spit back the entire text after annealing. It just learns the essence like a human.
replies(1): >>44490817 #
26. badmintonbaseba ◴[] No.44490817{6}[source]
Morally maybe, but AFAIK machines "learning" and creating creative works on their own is not recognized legally, at least certainly not the same way as for people.
replies(1): >>44491070 #
27. edgineer ◴[] No.44490835{3}[source]
The paradigm is that teachers will teach life skills like public speaking and entrepreneurship. Book smarts that can be more effectively taught by AI will be, once schools catch up.
28. ohashi ◴[] No.44490838[source]
Because they are mostly software developers who think it's different because it impacts them.
29. covercash ◴[] No.44490868{5}[source]
When the rich and powerful face zero consequences for breaking laws and ignoring the social contracts that keep our society functioning, you wind up with extreme overcorrections. See Luigi.
replies(1): >>44491472 #
30. jeroenhd ◴[] No.44490963[source]
Legally speaking, we don't know that yet. Early signs are pointing at judges allowing this kind of crap because it's almost impossible for most authors to point out what part of the generated slop was originally theirs.
31. Workaccount2 ◴[] No.44491070{7}[source]
>AFAIK machines "learning" and creating creative works on their own is not recognized legally

Did you read the article? The judge literally just legally recognized it.

32. asadotzler ◴[] No.44491385[source]
Buying a book is not license to re-sell that content for your own profit. I can't buy a copy of your book, make a million Xeroxes of it and sell those. The license you get when you buy a book is for a single use, not a license to do what ever you want with the contents of that book.
replies(2): >>44492012 #>>44492144 #
33. achierius ◴[] No.44491472{6}[source]
How extreme is that, really? Not to justify murder: that is clearly bad. But "killing one man" is evidently something we, as a society, consider an "acceptable side-effect" when a corporation does it -- hell, you can kill thousands and get away scot-free if you're big enough.

Luigi was peanuts in comparison.

“THERE were two “Reigns of Terror,” if we would but remember it and consider it; the one wrought murder in hot passion, the other in heartless cold blood; the one lasted mere months, the other had lasted a thousand years; the one inflicted death upon ten thousand persons, the other upon a hundred millions; but our shudders are all for the “horrors” of the minor Terror, the momentary Terror, so to speak; whereas, what is the horror of swift death by the axe, compared with lifelong death from hunger, cold, insult, cruelty, and heart-break? What is swift death by lightning compared with death by slow fire at the stake? A city cemetery could contain the coffins filled by that brief Terror which we have all been so diligently taught to shiver at and mourn over; but all France could hardly contain the coffins filled by that older and real Terror—that unspeakably bitter and awful Terror which none of us has been taught to see in its vastness or pity as it deserves.”

- Mark Twain

34. bmitc ◴[] No.44491603[source]
> I'm not saying this is justified, but what would you have done in their situation?

Individuals would have their lives ruined either from massive fines or jail time.

35. blibble ◴[] No.44491921[source]
> Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books.

$500,000 per infringement...

replies(1): >>44492769 #
36. SirMaster ◴[] No.44491990[source]
But should the purchase be like a personal license? Or like a commercia license that costs way more?

Because for example if you buy a movie on disc, that's a personal license and you can watch it yourself at home. But you can't like play it at a large public venue that sell tickets to watch it. You need a different and more expensive license to make money off the usage of the content in a larger capacity like that.

37. NoMoreNicksLeft ◴[] No.44491995[source]
Stealing? In what way?

Training a generative model on a book is the mechanical equivalent of having a human read the book and learn from it. Is it stealing if a person reads the book and learns from it?

replies(3): >>44493226 #>>44496020 #>>44496171 #
38. thedevilslawyer ◴[] No.44492012{3}[source]
What are you on about - the judge has literally said this was not resell, and is transformative and fair use.
39. suyjuris ◴[] No.44492144{3}[source]
Yes, of course! In this case, the judge identified three separate instances of copying: (1) downloading books without authorisation to add to their internal library, (2) scanning legitimately purchased books to add to their internal library, and (3) taking data from their internal library for the purposes of training LLMs. The purchasing part is only relevant for (2) — there the judge ruled that this is fair use. This makes a lot of sense to me, since no additional copies were created (they destroyed the physical books after scanning), so this is just a single use, as you say. The judge also ruled that (3) is fair use, but for a different reason. (They declined to decide whether (1) is fair use at this point, deferring to a later trial.)
40. Kim_Bruning ◴[] No.44492704{3}[source]
What someone at Anthropic did was download libgen once, then Anthropic figured "wait a minute, isn't that illegal?" , so instead they went and bought 7 million books for real and cut them up to scan them.

Turns out this doesn't quite mitigate downloading them first. (Though frankly, I'm very much against people having to buy 7 million books when someone has already scanned them)

41. jandrese ◴[] No.44492769[source]
And the crazy thing is that might be cheaper when you consider the alternative is to have your lawyers negotiate with the lawyers for the publishing companies for the right to use the works as training data. Not only is it many many billable hours just to draw up the contract, but you can be sure that many companies would either not play ball or set extremely high rates. Finally, if the publishing companies did bring a suit against Anthropic they might be asked to prove each case of infringement, basically to show that a specific work was used in training, which might be difficult since you can't reverse a model to get the inputs. When you're a billion dollar company it's much easier to get the courts to take your side. This isn't like the music companies suing teenagers who had a Kazaa account.
42. js8 ◴[] No.44492961[source]
> The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.

I think this is a fantasy. My father cowrote a Springer book about physics. For the effort, he got like $400 and 6 author copies.

Now, you might say he got a bad deal (or the book was bad), but I don't think hundreds of thousands of authors do significantly better. The reality is, people overwhelmingly write because they want to, not because of money.

43. ◴[] No.44493173[source]
44. eikenberry ◴[] No.44493214[source]
Plus they did it with a profit motive which would entail criminal proceedings.
45. blocko ◴[] No.44493226{3}[source]
Depends on how closely that person can reproduce the original work without license or attribution
replies(1): >>44493472 #
46. dragonwriter ◴[] No.44493244[source]
> 150K per work is the maximum fine for willful infringement

No, its not.

It's the maximum statutory damages for willful infringement, which this has not be adjudicated to be. it is not a fine, its an alternative to basis of recovery to actual damages + infringers profits attributable to the infringement.

Of course, there's also a very wide range of statutory damages, the minimum (if it is not "innocent" infringement) is $750/work.

> 105B+ is more than Anthropic is worth on paper.

The actual amount of 7 million works times $150,000/work is $1.05 trillion, not $105 billion.

replies(1): >>44493412 #
47. TimorousBestie ◴[] No.44493412{3}[source]
> It's the maximum statutory damages for willful infringement, which this has not be adjudicated to be. it is not a fine, its an alternative to basis of recovery to actual damages + infringers profits attributable to the infringement.

Yeah, you’re probably right, I’m not a lawyer. The point is that it doesn’t matter what number the law says they should pay, Anthropic can afford real lawyers and will therefore only pay a pittance, if anything.

I’m old enough to remember what the feds did to Aaron Schwarz, and I don’t see what Anthropic did that was so different, ethically speaking.

48. lcnPylGDnU4H9OF ◴[] No.44493472{4}[source]
It actually depends on whether or not they reproduce it and especially what they do with the copy after making it.
49. CaptainFever ◴[] No.44493519[source]
> I don't know why people in HN are giving a pass to AI company for this kind of behavior.

As mentioned in The Fucking Article, there's a legal difference between training an AI which largely doesn't repeat things verbatim (ala Anthropic) and redistributing media as a whole (ala Spotify, Netflix, journal, ad agency).

50. janalsncm ◴[] No.44496020{3}[source]
> In what way?

Downloading the book without paying for it, which is more or less what the judge said.

51. coffeefirst ◴[] No.44496171{3}[source]
But a language model is not a person, it’s a copy machine with a blender inside.

Photocopying books in their entirety for commercial use is absolutely illegal.