Most active commenters
  • AnthonyMouse(12)
  • freejazz(7)
  • badlibrarian(5)
  • throwawayffffas(4)
  • johnnyanmac(4)
  • (3)
  • Aeolun(3)

←back to thread

393 points pyman | 61 comments | | HN request time: 2.072s | source | bottom
Show context
dehrmann ◴[] No.44491718[source]
The important parts:

> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use

> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"

It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.

replies(6): >>44491820 #>>44491944 #>>44492844 #>>44494100 #>>44494132 #>>44494944 #
6gvONxR4sf7o ◴[] No.44491944[source]
You skipped quotes about the other important side:

> But Alsup drew a firm line when it came to piracy.

> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."

That is, he ruled that

- buying, physically cutting up, physically digitizing books, and using them for training is fair use

- pirating the books for their digital library is not fair use.

replies(6): >>44492103 #>>44492512 #>>44492665 #>>44493580 #>>44493641 #>>44495079 #
1. throwawayffffas ◴[] No.44492103[source]
So all they have to do is go and buy a copy of each book they pirated. They will have ceased and desisted.
replies(3): >>44492200 #>>44492352 #>>44493451 #
2. superfrank ◴[] No.44492200[source]
I'm trying to find the quote, but I'm pretty sure the judge specifically said that going and buying the book after the fact won't absolve them of liability. He said that for the books they pirated they broke the law and should stand trial for that and they cannot go back and un-break in by buying a copy now.

Found it: https://www.nbcnews.com/tech/tech-news/federal-judge-rules-c...

> “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” [Judge] Alsup wrote, “but it may affect the extent of statutory damages.”

replies(4): >>44492716 #>>44492936 #>>44493820 #>>44493889 #
3. dragonwriter ◴[] No.44492352[source]
> So all they have to do is go and buy a copy of each book they pirated.

No, that doesn't undo the infringement. At most, that would mitigate actual damages, but actual damages aren't likely to be important, given that statutory damages are an alternative and are likely to dwarf actual damages. (It may also figure into how the court assigns statutory damages within the very large range available for those, but that range does not go down to $0.)

> They will have ceased and desisted.

"Cease and desist" is just to stop incurring additional liability. (A potential plaintiff may accept that as sufficient to not sue if a request is made and the potential defendant complies, because litigation is uncertain and expensive. But "cease and desist" doesn't undo wrongs and neutralize liability when they've already been sued over.)

replies(1): >>44492771 #
4. zoklet-enjoyer ◴[] No.44492716[source]
Did they really steal if they didn't deprive anyone of their copy? I don't think copying is theft.
replies(6): >>44492775 #>>44492784 #>>44492861 #>>44492939 #>>44493045 #>>44493502 #
5. rockemsockem ◴[] No.44492771[source]
> So all they have to do is go and buy a copy of each book they pirated.

For anyone else who wants to do the same thing though this is likely all they need to do.

Cutting up and scanning books is hard work and actually doing the same thing digitally to ebooks isn't labor free either, especially when they have to be downloaded from random sites and cleaned from different formats. Torrenting a bunch of epubs and paying for individual books is probably cheaper

6. badlibrarian ◴[] No.44492775{3}[source]
"Tell it to the Judge..."
7. kjkjadksj ◴[] No.44492784{3}[source]
You may not think it is but the law does.
replies(1): >>44492922 #
8. axus ◴[] No.44492861{3}[source]
Agreed, the judge should avoid slang or even commonly accepted synonyms in an official ruling. The charge is not for theft.

Substitute infringement for theft.

9. buildbot ◴[] No.44492922{4}[source]
The law says it’s copyright infringement, not theft.
10. freejazz ◴[] No.44492936[source]
They also argued that they in no way could ever actually license all the materials they ingested
replies(1): >>44493194 #
11. ◴[] No.44492939{3}[source]
12. hadlock ◴[] No.44493045{3}[source]
It's copyright infringement, which is not theft, they're legally distinct in the eyes of the law. This is partly why the "you wouldn't download a car" copyright ads were so widely mocked.
replies(1): >>44493834 #
13. dmd ◴[] No.44493194{3}[source]
I love this argument so much. "But judge, there's no way I could ever afford to buy those jewels, so stealing them must be OK."
replies(1): >>44493585 #
14. tzs ◴[] No.44493451[source]
Generally you don't want laws to work that way. You want to set the penalties so that they discourage violating the law.

Setting the penalty to what it would have cost to obey the law in the first place does the opposite.

replies(1): >>44493756 #
15. fortran77 ◴[] No.44493502{3}[source]
It's fine that you think that way. But this is a discusion of the laws of the United States of America and ruling by American courts, not a discussion of your own legal theories.
replies(1): >>44493762 #
16. AnthonyMouse ◴[] No.44493585{4}[source]
The argument is more along the lines of, negotiating with millions of individuals each over a single copy of a work would cause the transaction costs to exceed the payments, and that kind of efficiency loss is the sort of thing fair use exists to prevent. It's not socially beneficial for the law to require you to create $2 in deadweight loss in order to transfer $1, and the cost to the author of not selling a single additional copy is not the thing they were really objecting to.
replies(6): >>44493769 #>>44493884 #>>44495038 #>>44495745 #>>44495819 #>>44496146 #
17. AnthonyMouse ◴[] No.44493756[source]
That's for criminal laws where prosecutorial discretion can then (in principle) be used in borderline cases to prevent unjust outcomes.

If you give people a claim for damages which is an order of magnitude larger than their actual damages, it encourages litigiousness and becomes a vector for shakedowns because the excessive cost of losing pressures innocent defendants to settle even if there was a 90% chance they would have won.

Meanwhile both parties have the incentive to settle in civil cases when it's obvious who is going to win, because a settlement to pay the damages is cheaper than the cost of going to court and then having to pay the same damages anyway. Which also provides a deterrent to doing it to begin with, because even having to pay lawyers to negotiate a settlement is a cost you don't want to pay when it's clear that what you're doing is going to have that result.

And when the result isn't clear, penalizing the defendant in a case of first impression isn't just either, because it wasn't clear and punitive measures should be reserved for instances of unambiguous wrongdoing.

replies(1): >>44493893 #
18. hnlmorg ◴[] No.44493762{4}[source]
The GP isn’t talking about some edge case legal dilemma that requires a lawyer or judge to comment. It’s already widely documented that copyright infringement is legally distinct from theft.
19. exe34 ◴[] No.44493769{5}[source]
That's right, so I can't individually discuss terms with each and every media creator, so from now on, I can just pirate everything.
replies(2): >>44493802 #>>44495408 #
20. AnthonyMouse ◴[] No.44493802{6}[source]
Needing a copy of one book you're going to spend a week reading has a lot less overhead than needing a copy of every book that you're going to process with a computer in bulk.
replies(1): >>44494039 #
21. ◴[] No.44493820[source]
22. __MatrixMan__ ◴[] No.44493834{4}[source]
Fun fact, they didn't have the rights to use the font they used for those commercials: https://news.ycombinator.com/item?id=43775926
replies(1): >>44494756 #
23. freejazz ◴[] No.44493884{5}[source]
> and that kind of efficiency loss is the sort of thing fair use exists to prevent.

No it's not. And you ever heard of a publishing house? They don't need to negotiate with every single author individually. That's preposterous.

replies(2): >>44493908 #>>44495777 #
24. irthomasthomas ◴[] No.44493889[source]
Is copyright in America different to Britain? There, it is legal to download books you don't own. Only distribution is a crime, which most torrenters break by seeding.
replies(3): >>44494662 #>>44495000 #>>44496102 #
25. badlibrarian ◴[] No.44493893{3}[source]
Statutory damages were written into the first federal copyright law in 1790, and earlier in state law (specified in Pounds because the dollar hadn't been invented yet).
replies(1): >>44494193 #
26. AnthonyMouse ◴[] No.44493908{6}[source]
It kind of is though?

It's not the only reason fair use exists, but it's the thing that allows e.g. search engines to exist, and that seems pretty important.

> And you ever heard of a publishing house? They don't need to negotiate with every single author individually. That's preposterous.

There are thousands of publishing houses and millions of self-published authors on top of that. Many books are also out of print or have unclear rights ownership.

replies(1): >>44494139 #
27. recursive ◴[] No.44494039{7}[source]
I like to glance at the cover art. I can do ten per second when I really get into my flow state. Sometimes I read them also, but that's incidental.
replies(1): >>44494092 #
28. AnthonyMouse ◴[] No.44494092{8}[source]
If you go to the book store and glance at all the cover art without buying any of them, do you expect to be sued for this?
replies(1): >>44494224 #
29. freejazz ◴[] No.44494139{7}[source]
>It kind of is though?

No, it kinda isn't. Show me anything that supports this idea beyond your own immediate conjecture right now.

>It's not the only reason fair use exists, but it's the thing that allows e.g. search engines to exist, and that seems pretty important.

No, that's the transformative element of what a search engine provides. Search engines are not legal because they can't contact each licensor, they are legal because they are considered hugely transformative features.

>There are thousands of publishing houses and millions of self-published authors on top of that. Many books are also out of print or have unclear rights ownership.

Okay, and? How many customers does Microsoft bill on a monthly basis?

replies(1): >>44494442 #
30. AnthonyMouse ◴[] No.44494193{4}[source]
The first federal copyright law in 1790:

https://copyright.gov/about/1790-copyright-act.html

Specified in dollars because dollars had been invented (in 1789), but in the amount of one half of one dollar, i.e. $0.50. That's 1790 dollars, of course, so a little under $20 today. (There was basically no inflation for the first 100+ years of that because the US dollar was still backed by precious metals then; a dollar was worth slightly more in 1900 than in 1790.)

That seems more like an attempt to codify some amount of plausible actual damages so people aren't arguing endlessly about valuations, rather than an attempt to impose punitive damages. Most notably because -- unlike the current method -- it scales with the number of sheets reproduced.

replies(1): >>44494520 #
31. freejazz ◴[] No.44494224{9}[source]
If you do that and reproduce the covers or the protected elements thereof, you should absolutely expect to be sued.
replies(2): >>44495108 #>>44495135 #
32. AnthonyMouse ◴[] No.44494442{8}[source]
> Show me anything that supports this idea beyond your own immediate conjecture right now

It's inherent in the nature of the test. The most important fair use factor is the effect on the market for the work, so if the use would be uneconomical without fair use then the effect on the market is negligible because the alternative would be that the use doesn't happen rather than that the author gets paid for it.

> No, that's the transformative element of what a search engine provides. Search engines are not legal because they can't contact each licensor, they are legal because they are considered hugely transformative features.

To make a search engine you have to do two things. One is to download a copy of the whole internet, the other is to create a search index. I'm talking about the first one, you're talking about the second one.

> Okay, and? How many customers does Microsoft bill on a monthly basis?

Microsoft does this with an automated system. There is no single automated system where you can get every book ever written, and separately interfacing with all of the many systems needed in order to do it is the source of the overhead.

replies(1): >>44494815 #
33. badlibrarian ◴[] No.44494520{5}[source]
My fault for the hanging clause: nearly a dozen state laws preceded it and used pounds. Mostly because they were based on the British law and also because the war made a mess of the currency situation.

Statutory damages were added to reduce the burden on plaintiffs. Which encourages people to stay in line. How well this worked out and what it means when some company nobody heard of 4 years ago downloads a billion copyrighted pages and raises $3.5 billion against a $60 billion valuation...

Well suddenly $20/page still sounds about right.

replies(1): >>44494685 #
34. rahimnathwani ◴[] No.44494662{3}[source]
What do you mean by 'it is legal'?

Do you mean:

A) It's not a criminal offence?

B) The copyright owner cannot file a civil suit for damages?

C) Something else?

replies(1): >>44494958 #
35. AnthonyMouse ◴[] No.44494685{6}[source]
The <$20/page was the same for maps and charts, i.e. things that typically have a single page in the entire work, and came from a time when printing was done a page at a time, i.e. you'd lay out a page and print as many copies of that page as you'd expect to make copies of the entire book, then hide them somewhere else while you print the next page. It was basically a proxy for the number of copies of the work they caught you trying to make, not an attempt to turn a single copy of a 1000 page book into a 1000x multiplier on liability. Notice that otherwise you're letting the infringer choose the amount of the damages, because a larger page size or tighter layout would fit more words per page and therefore have fewer pages per book. (How many "pages" is an HTML document with infinite scroll?)

> Statutory damages were added to reduce the burden on plaintiffs. Which encourages people to stay in line.

It encourages people to not spend a lot of resources speculating about damages. That doesn't mean you need the amount to be punitive rather than compensatory.

replies(1): >>44494751 #
36. badlibrarian ◴[] No.44494751{7}[source]
Agree that a photo of a celebrity and a film containing that celebrity shouldn't have the same number. But a large punitive number in the context of willful infringement seems right to me. And in practice it's all negotiated down anyway, as evidenced by Internet Archive's fourth 30-day stay of its pending $600+ million lawsuit.
replies(1): >>44494855 #
37. gghffguhvc ◴[] No.44494756{5}[source]
Or the music. It was originally made as a one off for a film festival. Movie industry defended the lawsuit over the music.
38. freejazz ◴[] No.44494815{9}[source]
>It's inherent in the nature of the test. The most important fair use factor is the effect on the market for the work, so if the use would be uneconomical without fair use then the effect on the market is negligible because the alternative would be that the use doesn't happen rather than that the author gets paid for it.

No, that's not the most important factor. The transformative factor is the most important. Effect on market for the work doesn't even support your argument anyway. Your argument is about the cost of making the end product, which is totally distinct from the market effects on the copyright holder when the infringer makes and releases the infringing product.

>To make a search engine you have to do two things. One is to download a copy of the whole internet, the other is to create a search index. I'm talking about the first one, you're talking about the second one.

So? That doesn't make you right. Go read the opinions, dude. This isn't something that's actually up for debate. Search engines are fair uses because of their transformative effect, not because they are really expensive otherwise. Your argument doesn't even make sense. By that logic, anything that's expensive becomes a fair use. It's facially ridiculous. Them being expensive is neither sufficient nor necessary for them to be a fair use. Their transformative nature is both sufficient and necessary to be found a fair use. Full stop.

>Microsoft does this with an automated system. There is no single automated system where you can get every book ever written, and separately interfacing with all of the many systems needed in order to do it is the source of the overhead.

Okay, and? They don't need to get every single book ever written. The libraries they pirated do not consist of "every single book ever written". It's hard to take this argument in good faith because you're being so ridiculous.

replies(1): >>44495073 #
39. AnthonyMouse ◴[] No.44494855{8}[source]
"In practice it's negotiated down anyway" is precisely the issue. If they bring a questionable case against you and you think there's a significant chance you could win, but then there's a small chance you get bankrupted, there is unreasonable pressure for you to settle even if the plaintiffs are in the wrong.
replies(1): >>44495071 #
40. irthomasthomas ◴[] No.44494958{4}[source]
> Only distribution is a crime
replies(2): >>44495024 #>>44495043 #
41. throwawayffffas ◴[] No.44495000{3}[source]
I think it's very similar in both countries, but you have got it wrong. Downloading a book without permission is copyright infringement in both countries, regardless of whether you distribute it.

In the UK it's a criminal offense if you distribute a copyrighted work with the intent to make gain or with the expectation that the owner will make a loss.

Gain and loss are only financial in this context.

Meaning that in both countries the copyright owner can sue you for copyright infringement.

42. throwawayffffas ◴[] No.44495024{5}[source]
Only distribution with the intent to make money is a crime. If you are doing it for free you are not criminally liable. Unless I am missing something.
43. throwawayffffas ◴[] No.44495038{5}[source]
I don't even think their argument is about the money, I think it's more like we couldn't possibly find all these works in any other practical way.
44. rahimnathwani ◴[] No.44495043{5}[source]
What relevance does that have to the present case? The judge, in this civil matter, said there would be a trial. He didn't say anything about it being a criminal trial. The strings 'crim' and 'felon' do not appear in the ruling.

  We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness).
replies(1): >>44495396 #
45. badlibrarian ◴[] No.44495071{9}[source]
I'm not sure what a "questionable case" for willful copyright infringement might look like. Or an example where someone was clearly in the right and got screwed. It isn't the debtor's prison era.

Four factor test seems to be working, even in this case. Don't love it (it goes against my values and what I need to do in my job) but I get it.

Edit: we've triggered HN's patience for this discussion and it's now blocking replies. You do seem a bit long on Google and short on practical experience here. How else would you propose these types of disagreements get sorted? ("Anyone can be sued for anything" notwithstanding.)

There are explicltly no punitive damages in US Copyright law. And the "willful" provision in practice means demonstrating ongoing disregard, after being informed. It's a long walk to the end of that plank.

replies(1): >>44495089 #
46. AnthonyMouse ◴[] No.44495073{10}[source]
> No, that's not the most important factor. The transformative factor is the most important.

It's a four factor test because all of the factors are relevant, but if the use has negligible effect on the market for the work then it's pretty hard to get anywhere with the others. For example, for cases like classroom use, even making verbatim copies of the entire work is often still fair use. Buying a separate copy for each student to use for only a few minutes would make that use uneconomical.

> Effect on market for the work doesn't even support your argument anyway. You're argument is about the cost of making the end product, which is totally distinct from the market effects on the copyright holder when the infringer makes and releases the infringing product.

We're talking about the temporary copies they make during training. Those aren't being distributed to anyone else.

> So? That doesn't make you right.

Making a copy of everything on the internet is a prerequisite to making a search engine. It's something you have to do as a step to making the index, which is the transformative step. Are you suggesting that doing the first step is illegal or what do you propose justifies it?

> By that logic, anything that's expensive becomes a fair use. It's facially ridiculous.

Anything with unreasonably high transaction costs. Why is that ridiculous? It doesn't exempt any of the normal stuff like an individual person buying an individual book.

> They don't need to get every single book ever written.

They need to get as many books as possible, with the platonic ideal being every book. Whether or not the ideal is feasible in practice, the question is whether it's socially beneficial to impose a situation with excessively high transaction costs in order to require something with only trivial benefit to authors (potentially selling one extra copy).

replies(1): >>44495469 #
47. AnthonyMouse ◴[] No.44495089{10}[source]
> I'm not sure what a "questionable case" for willful copyright infringement might look like.

You did anything which it's not clear whether it's fair use or not. Willfulness is whether you knew you were doing it, not whether you knew whether it was fair use, which in many cases nobody knows until a court decides it, hence the problem.

You have to do it in order to get into court and find out of you're allowed to do it (a ridiculous prerequisite to begin with), and then if it goes against you, you have to pay punitive damages?

48. ◴[] No.44495108{10}[source]
49. AnthonyMouse ◴[] No.44495135{10}[source]
So for example, if the bookstore has a nice 4k surveillance camera and you have access to it because you work there, sitting at home and using it to look at the cover art on all the books on display is something you'd expect to be sued over?
replies(2): >>44495454 #>>44495759 #
50. Aeolun ◴[] No.44495396{6}[source]
There can always be a trial, even if nothing was done to warrant it.

I think the distinction between civil and criminal trials is smaller in my home country. The fact that there is a trial at all implies that someone commited a ‘crime’.

51. Aeolun ◴[] No.44495408{6}[source]
This is literally why a lot of people pirate content, yes. It’s pretty much always the only way to obtain the content, even if you are otherwise fine with paying for it.
replies(1): >>44495752 #
52. freejazz ◴[] No.44495454{11}[source]
Re-read my comment: "If you do that and reproduce the covers or the protected elements thereof"

This conversation becomes incredibly unenjoyable when you pull rhetorical techniques like completely ignoring the entirety of what I wrote.

53. freejazz ◴[] No.44495469{11}[source]
>It's a four factor test because all of the factors are relevant, but if the use has negligible effect on the market for the work then it's pretty hard to get anywhere with the others. For example, for cases like classroom use, even making verbatim copies of the entire work is often still fair use. Buying a separate copy for each student to use for only a few minutes would make that use uneconomical.

All four factors are not equally relevant which is something described in pretty much every single fair use opinion. Educational uses are educational uses and considered fair because of their educational purpose (purpose is one of the factors), again, not because it's expensive. Maybe next time try googling or using ChatGPT "fair use educational".

>We're talking about the temporary copies they make during training. Those aren't being distributed to anyone else.

It's your argument. Not mine. You do not understand the market harm factor and it has nothing to do with Anthropic's transaction costs. That's just fully outright absolutely incorrect application of law.

>Making a copy of everything on the internet is a prerequisite to making a search engine. It's something you have to do as a step to making the index, which is the transformative step. Are you suggesting that doing the first step is illegal or what do you propose justifies it?

The transformative step is why it's a fair use, not the "market harm" (which you misunderstand) or the made up argument that it's "too expensive". In fact, I said this like every single turn in our conversation so it's a bit perplexing to me that you can now ask me "do you mean that it being transformative is what makes it legal" when that was my exact argument three times.

>Anything with unreasonably high transaction costs. Why is that ridiculous? It doesn't exempt any of the normal stuff like an individual person buying an individual book.

It's ridiculous because of the example I gave. Things being expensive is not a defense to copyright infringement and copyright law has no obligation to make expensive business models work. Copyright has an obligation to make transformative business models work because of the overall good they provide to society. Describing it as a "transaction cost" just kicks the can down the road even further and doesn't deal with the substance, either. They could have gone to the major publishers and licensed books from them. They didn't. That's generally who they are being sued by. When they are being sued by copyright owners in the fringe examples you pointed to, they will become relevant then.

>They need to get as many books as possible, with the platonic ideal being every book. Whether or not the ideal is feasible in practice, the question is whether it's socially beneficial to impose a situation with excessively high transaction costs in order to require something with only trivial benefit to authors (potentially selling one extra copy).

Lol dude, it was your example, not mine. They do not need every single book. They aren't being sued over every single book anyway, so it's totally besides the point.

54. johnnyanmac ◴[] No.44495745{5}[source]
They can. That's how any media service from Spotify to Netflix to Audible have to do things.

They simply don't want to and think they can skirt the law while the judges catch up.

55. johnnyanmac ◴[] No.44495752{7}[source]
Yes, and it's technically copyright infringement, even for private use. It's just that damages and enforcement is in feasible.

But if you tried to open a black market selling that media: you'd be hunted down to the ends of the earth. Or to China/North Korea, at least.

replies(1): >>44496004 #
56. johnnyanmac ◴[] No.44495759{11}[source]
Probably not sued, but it's possible Le to be. They'd probably just fire you instead.

Having access to a camera doesn't permit you to take the footage home to review.The company still owns that footage, after all.

Now, if you had your own camera recording everything at your desk... I guess that falls into one or two party states.

57. johnnyanmac ◴[] No.44495777{6}[source]
>They don't need to negotiate with every single author individually.

Yeah they do. What do you think the employees of a publishing house do? They make deals, work with authors, and accept/reject pitches. They 100% need to make sure every work is under a negotiated contract.

58. homebrewer ◴[] No.44495819{5}[source]
I used to order books in English from the US before shipping costs became prohibitive and the cost of shipping the book went to about twice to thrice the cost of the book itself. Is it fair use for me to download books from Anna's Archive now considering that books in English are not available in my region through other means (including the vast majority of ebooks)?

Rhetorical question, we all know that me reading books is not "transformative" so it won't be considered fair use for me to yoink them (transformative as in transforming more damage to the society at large into more money for the already rich).

59. Aeolun ◴[] No.44496004{8}[source]
> But if you tried to open a black market selling that media

Why would you ever do that? Nobody would buy it. They'd just get it in the same place you did.

60. kelnos ◴[] No.44496102{3}[source]
It's not a crime in the US, either, I believe, but you can certainly be sued in civil court for it.
61. kelnos ◴[] No.44496146{5}[source]
What do you mean by "negotiating"? They can buy the books in paperback form from Amazon. And for e-books available for sale without DRM, they get to skip the cutting and scanning part.

If the book is out of print, then tough luck. That's not a license to infringe on the publisher's copyright. If we're not ok with that, we have legislative means to change that. A judge shouldn't be rewriting law in that manner.