Most active commenters
  • adolph(3)
  • TeMPOraL(3)
  • cmiles74(3)

←back to thread

394 points pyman | 21 comments | | HN request time: 0.002s | source | bottom
1. farceSpherule ◴[] No.44491907[source]
Peterson was copying and selling pirated software.

Come up with a better comparison.

replies(1): >>44491926 #
2. organsnyder ◴[] No.44491926[source]
Anthropic is selling a service that incorporates these pirated works.
replies(1): >>44492293 #
3. adolph ◴[] No.44492293[source]
That a service incorporating the authors' works exists is not at issue. The plaintiffs' claims are, as summarized by Alsup:

  First, Authors argue that using works to train Claude’s underlying LLMs 
  was like using works to train any person to read and write, so Authors 
  should be able to exclude Anthropic from this use (Opp. 16). 

  Second, to that last point, Authors further argue that the training was 
  intended to memorize their works’ creative elements — not just their 
  works’ non-protectable ones (Opp. 17).

  Third, Authors next argue that computers nonetheless should not be 
  allowed to do what people do. 
https://media.npr.org/assets/artslife/arts/2025/order.pdf
replies(4): >>44492411 #>>44492758 #>>44492890 #>>44493381 #
4. codedokode ◴[] No.44492411{3}[source]
Computers cannot learn and are not subjects to laws. What happens, is a human takes a copyrighted work, makes an unauthorized digital copy, and loads it into a computer without authorization from copyright owner.
replies(2): >>44493047 #>>44493124 #
5. xdennis ◴[] No.44492758{3}[source]
> That a service incorporating the authors' works exists is not at issue.

It's not an issue because it's not currently illegal because nobody could have foreseen this years ago.

But it is profiting off of the unpaid work of millions. And there's very little chance of change because it's so hard to pass new protection laws when you're not Disney.

replies(3): >>44493198 #>>44493283 #>>44493456 #
6. lawlessone ◴[] No.44492890{3}[source]
> underlying LLMs was like using works to train any person to read and write

I don't think humans learn via backprop or in rounds/batches, our learning is more "online".

If I input text into an LLM it doesn't learn from that unless the creators consciously include that data in the next round of teaching their model.

Humans also don't require samples of every text in history to learn to read and write well.

Hunter S Thompson didn't need to ingest the Harry Potter books to write.

7. KoolKat23 ◴[] No.44493047{4}[source]
And they are not selling this or distributing this.

The model is very different.

replies(1): >>44494018 #
8. adolph ◴[] No.44493124{4}[source]
It can't be "unauthorized" if no authorization was needed.
9. adolph ◴[] No.44493198{4}[source]
Marx wrote The tradition of all dead generations weighs like an Alp on the brains of the living. and that would be true if one were obligated to pay the full freight of one's antecedents. The more positive truth is that the brains of the living reach new heights from that Alp and build ever new heights for those who come afterwards.
10. CaptainFever ◴[] No.44493283{4}[source]
Let's not expand copyright law.
11. TeMPOraL ◴[] No.44493381{3}[source]
The first paragraph sounds absurd, so I looked into the PDF, and here's the full version I found:

> First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for training or learning as such. Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable. For centuries, we have read and re-read books. We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems.

Couldn't have put it better myself (though $deity knows I tried many times on HN). Glad to see Judge Alsup continues to be the voice of common sense in legal matters around technology.

replies(2): >>44493990 #>>44494761 #
12. TeMPOraL ◴[] No.44493456{4}[source]
It's not an issue because it's not what this case was about, as the linked document explicitly states. The Authors did not contest the legality of the model's outputs, only the inputs used in training.
replies(1): >>44493731 #
13. megaman821 ◴[] No.44493731{5}[source]
Correct, the New York Times and Disney are suing for the output side. I am going to hazard a guess that you won't be able to circumvent copyright and trademark just because you are using AI. Where that line is has yet to be determined though.
replies(1): >>44493917 #
14. TeMPOraL ◴[] No.44493917{6}[source]
Right, but where that line will be drawn will have major impact on the near-term future of those models. If the user is liable for distributing infringing output that came from AI, that's not a problem for the field (and IMHO a reasonable approach) - but if they succeed in making the model vendors liable for the possibility of users generating infringing output, it'll shake things up pretty seriously.
15. cmiles74 ◴[] No.44493990{4}[source]
For everyone arguing that there’s no harm in anthropomorphizing an LLM, witness this rationalization. They talk about training and learning as if this is somehow comparable to human activities. The idea that LLM training is comparable to a person learning seems way out there to me.

“We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems.”

Claude is not doing any of these things. There is no admiration, no internalizing of sweeping themes. There’s a network encoding data.

We’re talking about a machine that accepts content and then produces more content. It’s not a person, it’s owned by a corporation that earns money on literally every word this machine produces. If it didn’t have this large corpus of input data (copyrighted works) it could not produce the output data for which people are willing to pay money. This all happens at a scale no individual could achieve because, as we know, it is a machine.

replies(1): >>44494516 #
16. cmiles74 ◴[] No.44494018{5}[source]
I have to disagree, without all the copyrighted input data there would be no output data for these companies to sell. This output data is the product and they are distributing it for dollars.
replies(1): >>44494072 #
17. KoolKat23 ◴[] No.44494072{6}[source]
Copyright is concerned with the the actual physical copy. The model isn't this. The end user would have to carefully prompt the models algorithm to output a copyright infringing piece.

This argument is more along the lines of: blaming Microsoft Word for someone typing characters into the word processors algorithm, and outputting a copy of an existing book. (Yes, it is a lot easier, but the rationale is the same). In my mind the end user prompting the model would be the one potentially infringing.

replies(1): >>44494226 #
18. cmiles74 ◴[] No.44494226{7}[source]
FWIW, I don’t think there is a prompt that would reliably produce, verbatim, a copyrighted work.

I do think that a big part of the reason Anthropic downloaded millions of books from pirate torrents was because they needed that input data in order to generate the output, their product.

I don’t know what that is, but, IMHO, not sharing those dollars with the creators of the content is clearly wrong.

replies(1): >>44495306 #
19. ben_w ◴[] No.44494516{5}[source]
There may be no admiration, but there definitely is an internalising of sweeping themes, and all the other things in your quotation, which anyone can fetch by asking it for the themes/substantive points/stylistic solutions of one of the books it has (for lack of a better verb) read.

That the mechanism performing these things is a network encoding data is… well, that description, at that level of abstraction, is a similarity with the way a human does it, not even a difference.

My network is a 3D mess made of pointy bi-lipid bags exchanging protons across gaps moderated by the presence of neurochemicals, rather than flat sheets of silicon exchanging electrons across tuned energy band-gaps moderated by other electrons, but it's still a network.

> We’re talking about a machine that accepts content and then produces more content. It’s not a person, it’s owned by a corporation that earns money on literally every word this machine produces. If it didn’t have this large corpus of input data (copyrighted works) it could not produce the output data for which people are willing to pay money. This all happens at a scale no individual could achieve because, as we know, it is a machine.

My brain is a machine that accepts content in the form of job offers and JIRA tickets (amongst other things), and then produces more content in the form of pull requests (amongst other things). For the sake specifically of this question, do the other things make a difference? While I count as a person and am not owned by any corporation, when I work for one, they do earn money on the words this biological machine produces. (And given all the models which are free to use, the LLMs definitely don't earn money on "literally" every word those models produce). If I didn't have the large corpus of input data — and there absolutely was copyright on a lot of the school textbooks and the TV broadcast educational content of the 80s and 90s when I was at school, and the Java programming language that formed the backbone of my university degree — I could not produce the output data for which people are willing to pay money.

Should corporations who hire me be required to pay Oracle every time I remember and use a solution that I learned from a Java course, even when I'm not writing Java?

That the LLMs do this at a scale no individual could achieve because it is a machine, means it's got the potential to wipe me out economically. Economics threat of automation has been a real issue at least since the luddites if not earlier, and I don't know how the dice will fall this time around, so even though I have one layer of backup plan, I am well aware it may not work, and if it doesn't then government action will have to happen because a lot of other people will be in trouble before trouble gets to me (and recent history shows that this doesn't mean "there won't be trouble").

Copyright law is one example of government action. So is mandatory education. So is UBI, but so too is feudalism.

Good luck to us all.

20. losvedir ◴[] No.44494761{4}[source]
> Glad to see Judge Alsup continues to be the voice of common sense in legal matters around technology

Yep, that name's a blast from the past! He was the judge on the big Google/Oracle case about Android and Java years ago, IIRC. I think he even learned to write some Java so he could better understand the case.

21. ◴[] No.44495306{8}[source]