←back to thread

397 points pyman | 9 comments | | HN request time: 0.411s | source | bottom
Show context
dehrmann ◴[] No.44491718[source]
The important parts:

> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use

> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"

It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.

replies(6): >>44491820 #>>44491944 #>>44492844 #>>44494100 #>>44494132 #>>44494944 #
6gvONxR4sf7o ◴[] No.44491944[source]
You skipped quotes about the other important side:

> But Alsup drew a firm line when it came to piracy.

> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."

That is, he ruled that

- buying, physically cutting up, physically digitizing books, and using them for training is fair use

- pirating the books for their digital library is not fair use.

replies(6): >>44492103 #>>44492512 #>>44492665 #>>44493580 #>>44493641 #>>44495079 #
throwawayffffas ◴[] No.44492103[source]
So all they have to do is go and buy a copy of each book they pirated. They will have ceased and desisted.
replies(3): >>44492200 #>>44492352 #>>44493451 #
superfrank ◴[] No.44492200[source]
I'm trying to find the quote, but I'm pretty sure the judge specifically said that going and buying the book after the fact won't absolve them of liability. He said that for the books they pirated they broke the law and should stand trial for that and they cannot go back and un-break in by buying a copy now.

Found it: https://www.nbcnews.com/tech/tech-news/federal-judge-rules-c...

> “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” [Judge] Alsup wrote, “but it may affect the extent of statutory damages.”

replies(4): >>44492716 #>>44492936 #>>44493820 #>>44493889 #
freejazz ◴[] No.44492936[source]
They also argued that they in no way could ever actually license all the materials they ingested
replies(1): >>44493194 #
dmd ◴[] No.44493194[source]
I love this argument so much. "But judge, there's no way I could ever afford to buy those jewels, so stealing them must be OK."
replies(1): >>44493585 #
AnthonyMouse ◴[] No.44493585[source]
The argument is more along the lines of, negotiating with millions of individuals each over a single copy of a work would cause the transaction costs to exceed the payments, and that kind of efficiency loss is the sort of thing fair use exists to prevent. It's not socially beneficial for the law to require you to create $2 in deadweight loss in order to transfer $1, and the cost to the author of not selling a single additional copy is not the thing they were really objecting to.
replies(6): >>44493769 #>>44493884 #>>44495038 #>>44495745 #>>44495819 #>>44496146 #
1. freejazz ◴[] No.44493884[source]
> and that kind of efficiency loss is the sort of thing fair use exists to prevent.

No it's not. And you ever heard of a publishing house? They don't need to negotiate with every single author individually. That's preposterous.

replies(2): >>44493908 #>>44495777 #
2. AnthonyMouse ◴[] No.44493908[source]
It kind of is though?

It's not the only reason fair use exists, but it's the thing that allows e.g. search engines to exist, and that seems pretty important.

> And you ever heard of a publishing house? They don't need to negotiate with every single author individually. That's preposterous.

There are thousands of publishing houses and millions of self-published authors on top of that. Many books are also out of print or have unclear rights ownership.

replies(1): >>44494139 #
3. freejazz ◴[] No.44494139[source]
>It kind of is though?

No, it kinda isn't. Show me anything that supports this idea beyond your own immediate conjecture right now.

>It's not the only reason fair use exists, but it's the thing that allows e.g. search engines to exist, and that seems pretty important.

No, that's the transformative element of what a search engine provides. Search engines are not legal because they can't contact each licensor, they are legal because they are considered hugely transformative features.

>There are thousands of publishing houses and millions of self-published authors on top of that. Many books are also out of print or have unclear rights ownership.

Okay, and? How many customers does Microsoft bill on a monthly basis?

replies(1): >>44494442 #
4. AnthonyMouse ◴[] No.44494442{3}[source]
> Show me anything that supports this idea beyond your own immediate conjecture right now

It's inherent in the nature of the test. The most important fair use factor is the effect on the market for the work, so if the use would be uneconomical without fair use then the effect on the market is negligible because the alternative would be that the use doesn't happen rather than that the author gets paid for it.

> No, that's the transformative element of what a search engine provides. Search engines are not legal because they can't contact each licensor, they are legal because they are considered hugely transformative features.

To make a search engine you have to do two things. One is to download a copy of the whole internet, the other is to create a search index. I'm talking about the first one, you're talking about the second one.

> Okay, and? How many customers does Microsoft bill on a monthly basis?

Microsoft does this with an automated system. There is no single automated system where you can get every book ever written, and separately interfacing with all of the many systems needed in order to do it is the source of the overhead.

replies(1): >>44494815 #
5. freejazz ◴[] No.44494815{4}[source]
>It's inherent in the nature of the test. The most important fair use factor is the effect on the market for the work, so if the use would be uneconomical without fair use then the effect on the market is negligible because the alternative would be that the use doesn't happen rather than that the author gets paid for it.

No, that's not the most important factor. The transformative factor is the most important. Effect on market for the work doesn't even support your argument anyway. Your argument is about the cost of making the end product, which is totally distinct from the market effects on the copyright holder when the infringer makes and releases the infringing product.

>To make a search engine you have to do two things. One is to download a copy of the whole internet, the other is to create a search index. I'm talking about the first one, you're talking about the second one.

So? That doesn't make you right. Go read the opinions, dude. This isn't something that's actually up for debate. Search engines are fair uses because of their transformative effect, not because they are really expensive otherwise. Your argument doesn't even make sense. By that logic, anything that's expensive becomes a fair use. It's facially ridiculous. Them being expensive is neither sufficient nor necessary for them to be a fair use. Their transformative nature is both sufficient and necessary to be found a fair use. Full stop.

>Microsoft does this with an automated system. There is no single automated system where you can get every book ever written, and separately interfacing with all of the many systems needed in order to do it is the source of the overhead.

Okay, and? They don't need to get every single book ever written. The libraries they pirated do not consist of "every single book ever written". It's hard to take this argument in good faith because you're being so ridiculous.

replies(1): >>44495073 #
6. AnthonyMouse ◴[] No.44495073{5}[source]
> No, that's not the most important factor. The transformative factor is the most important.

It's a four factor test because all of the factors are relevant, but if the use has negligible effect on the market for the work then it's pretty hard to get anywhere with the others. For example, for cases like classroom use, even making verbatim copies of the entire work is often still fair use. Buying a separate copy for each student to use for only a few minutes would make that use uneconomical.

> Effect on market for the work doesn't even support your argument anyway. You're argument is about the cost of making the end product, which is totally distinct from the market effects on the copyright holder when the infringer makes and releases the infringing product.

We're talking about the temporary copies they make during training. Those aren't being distributed to anyone else.

> So? That doesn't make you right.

Making a copy of everything on the internet is a prerequisite to making a search engine. It's something you have to do as a step to making the index, which is the transformative step. Are you suggesting that doing the first step is illegal or what do you propose justifies it?

> By that logic, anything that's expensive becomes a fair use. It's facially ridiculous.

Anything with unreasonably high transaction costs. Why is that ridiculous? It doesn't exempt any of the normal stuff like an individual person buying an individual book.

> They don't need to get every single book ever written.

They need to get as many books as possible, with the platonic ideal being every book. Whether or not the ideal is feasible in practice, the question is whether it's socially beneficial to impose a situation with excessively high transaction costs in order to require something with only trivial benefit to authors (potentially selling one extra copy).

replies(1): >>44495469 #
7. freejazz ◴[] No.44495469{6}[source]
>It's a four factor test because all of the factors are relevant, but if the use has negligible effect on the market for the work then it's pretty hard to get anywhere with the others. For example, for cases like classroom use, even making verbatim copies of the entire work is often still fair use. Buying a separate copy for each student to use for only a few minutes would make that use uneconomical.

All four factors are not equally relevant which is something described in pretty much every single fair use opinion. Educational uses are educational uses and considered fair because of their educational purpose (purpose is one of the factors), again, not because it's expensive. Maybe next time try googling or using ChatGPT "fair use educational".

>We're talking about the temporary copies they make during training. Those aren't being distributed to anyone else.

It's your argument. Not mine. You do not understand the market harm factor and it has nothing to do with Anthropic's transaction costs. That's just fully outright absolutely incorrect application of law.

>Making a copy of everything on the internet is a prerequisite to making a search engine. It's something you have to do as a step to making the index, which is the transformative step. Are you suggesting that doing the first step is illegal or what do you propose justifies it?

The transformative step is why it's a fair use, not the "market harm" (which you misunderstand) or the made up argument that it's "too expensive". In fact, I said this like every single turn in our conversation so it's a bit perplexing to me that you can now ask me "do you mean that it being transformative is what makes it legal" when that was my exact argument three times.

>Anything with unreasonably high transaction costs. Why is that ridiculous? It doesn't exempt any of the normal stuff like an individual person buying an individual book.

It's ridiculous because of the example I gave. Things being expensive is not a defense to copyright infringement and copyright law has no obligation to make expensive business models work. Copyright has an obligation to make transformative business models work because of the overall good they provide to society. Describing it as a "transaction cost" just kicks the can down the road even further and doesn't deal with the substance, either. They could have gone to the major publishers and licensed books from them. They didn't. That's generally who they are being sued by. When they are being sued by copyright owners in the fringe examples you pointed to, they will become relevant then.

>They need to get as many books as possible, with the platonic ideal being every book. Whether or not the ideal is feasible in practice, the question is whether it's socially beneficial to impose a situation with excessively high transaction costs in order to require something with only trivial benefit to authors (potentially selling one extra copy).

Lol dude, it was your example, not mine. They do not need every single book. They aren't being sued over every single book anyway, so it's totally besides the point.

8. johnnyanmac ◴[] No.44495777[source]
>They don't need to negotiate with every single author individually.

Yeah they do. What do you think the employees of a publishing house do? They make deals, work with authors, and accept/reject pitches. They 100% need to make sure every work is under a negotiated contract.

replies(1): >>44496425 #
9. freejazz ◴[] No.44496425[source]
The publishers could license the works in bulk, without the need for Anthropic to deal with the individual authors. Both sides pointed this out.