←back to thread

397 points pyman | 10 comments | | HN request time: 1.374s | source | bottom
Show context
dehrmann ◴[] No.44491718[source]
The important parts:

> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use

> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"

It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.

replies(6): >>44491820 #>>44491944 #>>44492844 #>>44494100 #>>44494132 #>>44494944 #
6gvONxR4sf7o ◴[] No.44491944[source]
You skipped quotes about the other important side:

> But Alsup drew a firm line when it came to piracy.

> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."

That is, he ruled that

- buying, physically cutting up, physically digitizing books, and using them for training is fair use

- pirating the books for their digital library is not fair use.

replies(6): >>44492103 #>>44492512 #>>44492665 #>>44493580 #>>44493641 #>>44495079 #
pier25 ◴[] No.44493580[source]
> buying, physically cutting up, physically digitizing books, and using them for training is fair use

So Suno would only really need to buy the physical albums and rip them to be able to generate music at an industrial scale?

replies(7): >>44493615 #>>44493850 #>>44494405 #>>44494753 #>>44494779 #>>44495203 #>>44496071 #
conradev ◴[] No.44494779[source]
Yes! Training and generation are fair use. You are free to train and generate whatever you want in your basement for whatever purpose you see fit. Build a music collection, go ham.

If the output from said model uses the voice of another person, for example, we already have a legal framework in place for determining if it is infringing on their rights, independent of AI.

Courts have heard cases of individual artists copying melodies, because melodies themselves are copyrightable: https://www.hypebot.com/hypebot/2020/02/every-possible-melod...

Copyright law is a lot more nuanced than anyone seems to have the attention span for.

replies(1): >>44494822 #
1. pier25 ◴[] No.44494822[source]
> Yes!

But Suno is definitely not training models in their basement for fun.

They are a private company selling music, using music made by humans to train their models, to replace human musicians and artists.

We'll see what the courts say but that doesn't sound like fair use.

replies(1): >>44495390 #
2. conradev ◴[] No.44495390[source]
My understanding is that Suno does not sell music, but instead makes a tool for musicians to generate music and sells access to this tool.

The law doesn't distinguish between basement and cloud – it's a service. You can sell access to the service without selling songs to consumers.

replies(3): >>44495595 #>>44495608 #>>44495707 #
3. pier25 ◴[] No.44495595[source]
That's like arguing that a restaurant doesn't sell food because it sells the service of cooking it.
replies(1): >>44495997 #
4. pyman ◴[] No.44495608[source]
What does "fair use" even mean in a world where models can memorise and remix every book and song ever written? Are we erasing ownership?

The problem is, copyright law wasn't written for machines. It was written for humans who create things.

In the case of songs (or books, paintings, etc), only humans and companies can legally own copyright, a machine can't. If an AI-powered tool generates a song, there’s no author in the legal sense, unless the person using the tool claims authorship by saying they operated the tool.

So we're stuck in a grey zone: the input is human, the output is AI generated, and the law doesn't know what to do with that.

For me the real debate is: Do we need new rules for non-human creation?

replies(1): >>44495950 #
5. johnnyanmac ◴[] No.44495707[source]
That doesn't seem to track in my mind. So you can't sell music but you can sell 10 second snippets of music you pirated? It doesn't math out.

But i guess I'm not surprised that 2025 has little respect for artists.

6. markhahn ◴[] No.44495950{3}[source]
why are you saying "memorize"? are people training AIs to regurgitate exact copies? if so, that's just copying. if they return something that is not a literal copy of the whole work, then there is established caselaw about how much is permitted. some clearly is, but not entire works.

when you buy a book, you are not acceding to a license to only ever read it with human eyes, forbearing to memorize it, never to quote it, never to be inspired by it.

replies(2): >>44496065 #>>44496175 #
7. conradev ◴[] No.44495997{3}[source]
The restaurant is not responsible for E. coli if it’s found, are they? Just cooking it out of the food

Suno can’t prevent humans from copying other humans, it can only make sure that the direct output of its system isn’t infringing.

8. mwarkentin ◴[] No.44496065{4}[source]
> Specifically, the paper estimates that Llama 3.1 70B has memorized 42 percent of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time. (I’ll unpack how this was measured in the next section.)

> Interestingly, Llama 1 65B, a similar-sized model released in February 2023, had memorized only 4.4 percent of Harry Potter and the Sorcerer's Stone. This suggests that despite the potential legal liability, Meta did not do much to prevent memorization as it trained Llama 3. At least for this book, the problem got much worse between Llama 1 and Llama 3.

> Harry Potter and the Sorcerer's Stone was one of dozens of books tested by the researchers. They found that Llama 3.1 70B was far more likely to reproduce popular books—such as The Hobbit and George Orwell’s 1984—than obscure ones. And for most books, Llama 3.1 70B memorized more than any of the other models.

9. pyman ◴[] No.44496175{4}[source]
You are comparing AI to humans, but they're not the same. Humans don't memorise millions of copyrighted work and spit out similar content. AI does that.

Memorising isn't wrong but when machines memorise at scale and the people behind the original work get nothing, it raises big ethical questions.

The law hasn't caught up.

replies(1): >>44496684 #
10. bongodongobob ◴[] No.44496684{5}[source]
As a former musician, yes, we do. Any above average musician can play "Riders on the Storm" in the style of Johnny Cash, or Green Day, or Nirvana, etc. Successful above average musicians usually have almost encyclopedic knowledge of artists and albums at least in their favorite genre. This is how all art is made. Some artists will be more honest about this than others.