Most active commenters
  • pyman(7)
  • KoolKat23(6)
  • impossiblefork(4)
  • j_w(3)
  • jpalawaga(3)

←back to thread

393 points pyman | 55 comments | | HN request time: 1.438s | source | bottom
1. pyman ◴[] No.44488332[source]
Anthropic's cofounder, Ben Mann, downloaded million copies of books from Library Genesis in 2021, fully aware that the material was pirated.

Stealing is stealing. Let's stop with the double standards.

replies(8): >>44488391 #>>44488540 #>>44488816 #>>44490720 #>>44491032 #>>44491583 #>>44492035 #>>44493242 #
2. damnesian ◴[] No.44488391[source]
oh well, the product has a cute name and will make someone a billionaire, let's just give it the green light. who cares about copyright in the age of AI?
3. originalvichy ◴[] No.44488540[source]
At least most pirates just consume for personal use. Profiting from piracy is a whole other level beyond just pirating a book.
replies(4): >>44488621 #>>44488853 #>>44489003 #>>44490718 #
4. pyman ◴[] No.44488621[source]
Someone on Twitter said: "Oh well, P2P mp3 downloads, although illegal, made contributions to the music industry"

That's not what's happening here. People weren't downloading music illegally and reselling it on Claude.ai. And while P2P networks led to some great tech, there's no solid proof they actually improved the music industry.

replies(2): >>44489039 #>>44489127 #
5. x3n0ph3n3 ◴[] No.44488816[source]
Copyright infringement is not stealing.
replies(4): >>44488893 #>>44488987 #>>44490404 #>>44490503 #
6. mnky9800n ◴[] No.44488853[source]
I feel like profit was always a central motive of pirates. At least from the historical documents known as, "The Pirates of the Caribbean".
7. 1oooqooq ◴[] No.44488893[source]
actually, the Only time it's a (ethical) crime is when a corporation does it at scale for profit.
8. pyman ◴[] No.44488987[source]
Pirating a book and selling it on claude.ai is stealing, both legally and morally.
replies(4): >>44489073 #>>44489353 #>>44489445 #>>44492112 #
9. KoolKat23 ◴[] No.44489003[source]
This isn't really profiting from piracy. They don't make money off the raw input data. It's no different to consuming for personal use.

They make money off the model weights, which is fair use (as confirmed by recent case law).

replies(1): >>44489216 #
10. drcursor ◴[] No.44489039{3}[source]
Let's not forget Spotify ;)

https://gizmodo.com/early-spotify-was-built-on-pirated-mp3-f...

replies(1): >>44489250 #
11. zb3 ◴[] No.44489073{3}[source]
Who got robbed? Just because I'd pay for AI it doesn't mean I'd buy these books.
replies(1): >>44489291 #
12. Imustaskforhelp ◴[] No.44489127{3}[source]
I really feel as if Youtube is the best sort of convenience for music videos where most people watch ads whereas some people can use an ad blocker.

I use an adblocker and tbh I think so many people on HN are okay with ad blocking and not piracy when basically both just block the end user from earning money.

I kind of believe that if you really like a software, you really like something. Just ask them what their favourite charity is and donate their or join their patreon/a direct way to support them.

replies(2): >>44491147 #>>44493551 #
13. j_w ◴[] No.44489216{3}[source]
This is absurd. Remove all of the content from the training data that was pirated and what is the quality of the end product now?
replies(2): >>44489279 #>>44489283 #
14. pyman ◴[] No.44489250{4}[source]
Those claims were never proved.
15. pyman ◴[] No.44489279{4}[source]
With Claude, people are paying Anthropic to access answers that are generated from pirated books, without the authors permission, credit, or compensation.
replies(1): >>44489304 #
16. KoolKat23 ◴[] No.44489283{4}[source]
That's the law.

Please keep in mind, copyright is intended as a compromise between benefit to society and to the individual.

A thought experiment, students pirating textbooks and applying that knowledge later on in their work?

replies(2): >>44489587 #>>44495512 #
17. pyman ◴[] No.44489291{4}[source]
You should ask the teachers who spent years writing those books.
replies(2): >>44491497 #>>44492124 #
18. KoolKat23 ◴[] No.44489304{5}[source]
There is no copyright on knowledge.

If it outputs parts of the book verbatim then that's a different story.

replies(2): >>44489612 #>>44492025 #
19. BlackFly ◴[] No.44489353{3}[source]
Making a copy differs from taking an existing object in all aspects: literally, technically, legally and ethically. Piracy is making a copy you have no legal right to. Stealing is taking a physical object that you have no legal right to. While the "no legal right to" seems the same superficially, in practice the laws differ quite a bit because the literal, technical and ethical aspects differ.
20. TiredOfLife ◴[] No.44489445{3}[source]
They are not selling it on claude.ai. If you can prove that they are you will be rich.
21. j_w ◴[] No.44489587{5}[source]
When you say that's the law, as far as I'm aware a single ruling by a lower court has been issued which upholds that application. Hardly settled case law.
replies(1): >>44489760 #
22. pyman ◴[] No.44489612{6}[source]
Let's don't change the focus of the debate.

Pirating 7 million books, remixing their content, and using that to power Claude.ai is like counterfeiting 7 million branded products and selling them on your personal website. The original creators don't get credit or payment, and someone’s profiting off their work.

All this happens while authors, many of them teachers, are left scratching their heads with four kids to feed

replies(1): >>44489775 #
23. KoolKat23 ◴[] No.44489760{6}[source]
True, until then best to act as if it is the case.

In my opinion, it will be upheld.

Looking at what is stored and the manner which it is stored. It makes sense that it's fair use.

replies(1): >>44492896 #
24. KoolKat23 ◴[] No.44489775{7}[source]
That may be the case, but you'd have to have laws changed.
25. seydor ◴[] No.44490404[source]
property infringement isn't either?
replies(1): >>44491729 #
26. impossiblefork ◴[] No.44490503[source]
It's very similar to theft of service.

There's so many texts, and they're so sparse that if I could copyright a work and never publish it, the restriction would be irrelevant. The probability that you would accidentally come upon something close enough that copyright was relevant is almost infinitesimal.

Because of this copyright is an incredibly weak restriction, and that it is as weak as it is shows clearly that any use of a copyrighted work is due to the convenience that it is available.

That is, it's about making use of the work somebody else has done, not about that restricting you somehow.

Therefore copyright is much more legitimate than ordinary property. Ordinary property, especially ownership of land, can actually limit other people. But since copyright is so sparse infringing on it is like going to world with near-infinite space and picking the precise place where somebody has planted a field and deciding to harvest from that particular field.

Consequently I think copyright infringement might actually be worse than stealing.

replies(2): >>44491877 #>>44492988 #
27. mrcwinn ◴[] No.44490718[source]
> At least most pirates just consume for personal use.

Easy for the pirate to say. Artists might argue their intent was to trade compensation for one's personal enjoyment of the work.

replies(2): >>44491100 #>>44491137 #
28. Der_Einzige ◴[] No.44490720[source]
Information wants to be free.
replies(1): >>44491829 #
29. dathinab ◴[] No.44491032[source]
stealing with the intent to gain a unfair marked advantage so that you can effectively kill any ethically legally correctly acting company in a way which is very likely going to hurt many authors through the products you create is far worse then just stealing for personal use

that isn't "just" stealing, it's organized crime

30. Workaccount2 ◴[] No.44491100{3}[source]
The gut punch of being a photographer selling your work on display, someone walks by and lines up their phone to take a perfect picture of your photograph, and then exclaims to you "Your work is beautiful! I can't wait to print this out and put it on my wall!"
31. jobs_throwaway ◴[] No.44491137{3}[source]
All the evidence shows that piracy is good for artists' business. You make a good work, people are exposed to it through piracy, and they end up buying more of your stuff than they would otherwise. But keep crying about the artist's plight
replies(1): >>44491539 #
32. Workaccount2 ◴[] No.44491147{4}[source]
If you are someone who can think clearly, it's extremely obvious that the conversation around copyright, LLMs, piracy, and ad-blocking is

"What serves me personally the best for any given situation" for 95% of people.

33. azangru ◴[] No.44491497{5}[source]
You keep saying the word "teachers"; but that word does not appear in the text of the article. Why focus on the teachers in particular?

Also, there are various incentives for teachers to publish books. Money is just one of them (I wonder how much revenue books bring to the teachers). Prestige and academic recognition is another. There are probably others still. How realistic is the depiction of a deprived teacher whose livelihood depended on the books he published once every several years?

34. SketchySeaBeast ◴[] No.44491539{4}[source]
The way you've presented this, the evidence is just "common sense", which isn't much evidence at all.
35. 1970-01-01 ◴[] No.44491583[source]
Let's get actual definitions of 'theft' before we leap into double standards.
36. eviks ◴[] No.44491729{3}[source]
If you infringe by destroying property, then yes, it's not stealing
37. troyvit ◴[] No.44491829[source]
Then why does Claude cost money?
38. jpalawaga ◴[] No.44491877{3}[source]
you've created a very obvious category mistake in your final summary by confusing intellectual property--which can be copied at no penalty to an owner (except nebulous 'alternate universe' theories)--with actual property, and a farmer and his land, with a crop that cannot be enjoyed twice.

you're saying copying a book is worse than robbing a farmer of his food and/or livelihood, which cannot be replaced to duplicated. Meanwhile, someone who copies a book does not deprive the author of selling the book again (or a tasty proceedings from harvest).

I can't say I agree, for obvious reasons.

replies(1): >>44492100 #
39. SirMaster ◴[] No.44492025{6}[source]
>If it outputs parts of the book verbatim then that's a different story.

But it does...

40. NoMoreNicksLeft ◴[] No.44492035[source]
>Stealing is stealing.

Yes, but copying isn't stealing, because the person you "take" from still has their copy.

If you're allowed to call copying stealing, then I should be allowed to call hysterical copyright rabblerousing rape. Quit being a rapist, pyman.

41. impossiblefork ◴[] No.44492100{4}[source]
With this special infinite-land-land though, what's special about the farmer's land is that he's expended energy to make it that way, just as the author has expended energy to find his text.

Just as the farmer obtains his livelihood from the investment-of-energy-to-raise-crops-to-energy cycle the author has his livelihood by the investment-of-energy-to-finding-a-useful-work-to-energy cycle.

So he is in fact robbed in a very similar way.

replies(1): >>44493941 #
42. thedevilslawyer ◴[] No.44492112{3}[source]
Where can I download Harry Potter on claude.ai pls?
replies(1): >>44492119 #
43. slater ◴[] No.44492119{4}[source]
Why would you want to download a shitty book?
44. zb3 ◴[] No.44492124{5}[source]
I did not ask them to write those books, and I wouldn't buy those.
45. j_w ◴[] No.44492896{7}[source]
We're talking about a summary judgement issued that has not yet been appealed. That doesn't make it "settled."

If by "what is stored and the manner which it is stored" is intended to signal model weights, I'm not sure what the argument is? The four factors of copyright in no way mention a storage medium for data, lossless or loss-y.

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.

In my opinion, this will likely see a supreme court ruling by the end of the decade.

replies(1): >>44493389 #
46. CaptainFever ◴[] No.44492988{3}[source]
> Consequently I think copyright infringement might actually be worse than stealing.

I remember when piracy wasn't theft, and information wanted to be free.

replies(1): >>44494093 #
47. kube-system ◴[] No.44493242[source]
> Stealing is stealing. Let's stop with the double standards.

I get the sentiment, but that statement as is, is absurdly reductive. Details matter. Even if someone takes merchandise from a store without paying, their sentence will vary depending on the details.

48. KoolKat23 ◴[] No.44493389{8}[source]
The use is to train an AI model.

A trillion parameter SOTA model is not substantially comprised of the one copyrighted piece. (If it was a Harry Potter model trained only on Harry Potter books this would be a different story).

Embeddings are not copy paste.

The last point about market impact would be where they make their argument but it's tenuous. It's not the primary use of AI models and built in prompts try to avoid this, so it shouldn't be commonplace unless you're jail breaking the model, most folk aren't.

replies(1): >>44495528 #
49. timeon ◴[] No.44493551{4}[source]
I think that critique of this case is not about piracy in itself but how these companies are treated by courts vs. how individuals are treated.
50. jpalawaga ◴[] No.44493941{5}[source]
You're saying that a copy of a digital thing is the same as the "only" of a physical thing. But that's not true. You can't sell grain twice, but you can sell a movie many times (especially when you account for format changes, remasterings, platform locks, licensing for special usecases like remixing, broadcasts, etc).

You'd have to steal the author's ownership of the intellectual property in order for the comparison to be valid, just as you stole ownership of his crop.

Separately, there is a reason why theft and copyright infringement are two distinct concepts in law.

replies(1): >>44494203 #
51. impossiblefork ◴[] No.44494093{4}[source]
So do I, then I found this reasoning I presented in my comment and realised that piracy was actually quite bad.

Ordinary property is much worse than copyright, which is both time limited and not necessarily obtained through work, and which is much more limited in availability than the number of sequences.

When someone owns land, that's actually a place you stumble upon and can't enter, whereas you're not going to ever stumble upon the story of even 'Nasse hittar en stol' (swedish 'Nasse finds a chair') a very short book for very small children.

52. impossiblefork ◴[] No.44494203{6}[source]
The difference here though is that the copyright holder sustains himself by the sales of his particular chosen text, so it doesn't matter that the text can be reproduced infinitely.
replies(1): >>44495782 #
53. nwienert ◴[] No.44495512{5}[source]
Its the law (for now, very early on this in the process of deciding the law, untested, appealable, likely to be appealed and tested many times in many ways).

Meanwhile other cases have been less friendly to it being fair use, AI companies are already paying vast sums to publishers who presumably they wouldn’t if they felt confident it was “the law”, and on and on.

I don’t like arguing from “it’s the law”. A lot of law is terrible. What’s right? It’s clear to me that if AI gets good enough, as it nearly is now, it sucks a lot of profit away from creators. That is unbalanced. The AI doesn’t exist without the creators, the creators need to exist for our society to be great (we want new creative works, more if anything). Law tends to start conservatively based on historical precedent, and when a new technology comes along it often errs on letting it do some damage to avoid setting a bad precedent. In time it catches up as society gets a better view of things.

The right thing is likely not to let our creative class be decimated so a few tech companies become fantastically wealthy - in the long run, it’s the right thing even for the techies.

54. nwienert ◴[] No.44495528{9}[source]
I bet it’s pretty easy to reproduce enough of Harry Potter from these models that any judge would see it as not fair use - you’d just have to prompt it in the right way. I’d bet a large sum that when this eventually shakes through the Supreme Court, it won’t be deemed fair use entirely, for the better of the world.
55. jpalawaga ◴[] No.44495782{7}[source]
If you assume only way people are obtaining the media is by unlicensed reproduction, then it doesn’t matter.

Big if. Practically, the movie studios aren’t poor because their product has instances of infringement.