Most active commenters

Suppafly(9)
johnnyanmac(7)
stevenAthompson(6)
palmotea(5)
Intralexical(5)
datavirtue(4)
zelphirkalt(4)
nadermx(3)
SilasX(3)
staticman2(3)

Popular/hot comments

>>43963464 #
>>43964716 #
>>43963243 #
>>43963423 #
>>43963480 #
>>43963908 #
>>43964597 #
>>43964747 #

←back to thread

US Copyright Office found AI companies breach copyright. Its boss was fired

(www.theregister.com)

1. mattxxx ◴[12 May 25 13:58 UTC] No.43962976[source]▶

>>43961247 (OP) #

Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

replies(19): >>43963017 #>>43963125 #>>43963168 #>>43963214 #>>43963243 #>>43963311 #>>43963423 #>>43963517 #>>43963612 #>>43963721 #>>43963943 #>>43964079 #>>43964280 #>>43964365 #>>43964448 #>>43964562 #>>43965792 #>>43965920 #>>43976732 #

2. madeofpalk ◴[12 May 25 14:03 UTC] No.43963017[source]▶

>>43962976 (TP) #

> Humans can read a book, get inspiration, and write a new book and not be litigated against

Humans get litigated against this all the time. There is such thing as, charitably, being too inspired.

https://en.wikipedia.org/wiki/List_of_songs_subject_to_plagi...

replies(1): >>43963509 #

3. ActionHank ◴[12 May 25 14:11 UTC] No.43963125[source]▶

>>43962976 (TP) #

Assuming this means copyright is dead, companies will be vary upset and patents will likely follow.

The hold US companies have on the world will be dead too.

I also suspect that media piracy will be labelled as the only reason we need copyright, an existing agency will be bolstered to address this concern and then twisted into a censorship bureau.

4. timdiggerm ◴[12 May 25 14:17 UTC] No.43963214[source]▶

>>43962976 (TP) #

Or we could acknowledge that something could be a bad idea, despite its utility

5. stevenAthompson ◴[12 May 25 14:19 UTC] No.43963243[source]▶

>>43962976 (TP) #

Doing a cover song requires permission, and doing it without that permission can be illegal. Being inspired by a song to write your own is very legal.

AI is fine as long as the work it generates is substantially new and transformative. If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem.

Yes, I'm aware that machines aren't people and can't be "inspired", but if the functional results are the same the law should be the same. Vaguely defined ideas like your soul or "inspiration" aren't real. The output is real, measurable, and quantifiable and that's how it should be judged.

replies(3): >>43963561 #>>43963629 #>>43964441 #

6. jeroenhd ◴[12 May 25 14:23 UTC] No.43963311[source]▶

>>43962976 (TP) #

Pirating movies is also useful, because I can watch movies without paying on devices that apps and accounts don't work on.

That doesn't make piracy legal, even though I get a lot of use out of it.

Also, a person isn't a computer so the "but I can read a book and get inspired" argument is complete nonsense.

replies(2): >>43963560 #>>43964460 #

7. vessenes ◴[12 May 25 14:31 UTC] No.43963423[source]▶

>>43962976 (TP) #

Thank you - a voice of sanity on this important topic.

I understand people who create IP of any sort being upset that software might be able to recreate their IP or stuff adjacent to it without permission. It could be upsetting. But I don't understand how people jump to "Copyright Violation" for the fact of reading. Or even downloading in bulk. The Copyright controls, and has always controlled, creation and distribution of a work. In the nature even of the notice is embedded the concept that the work will be read.

Reading and summarizing have only ever been controlled in western countries via State's secrets type acts, or alternately, non-disclosure agreements between parties. It's just way, way past reality to claim that we have existing laws to cover AI training ingesting information. Not only do we not, such rules would seem insane if you substitute the word human for "AI" in most of these conversations.

"People should not be allowed to read the book I distributed online if I don't want them to."

"People should not be allowed to write Harry Potter fanfic in my writing style."

"People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics, the societal tradeoffs we've made so far, and is able to discuss where we might want to go, and what would be best.

replies(3): >>43963908 #>>43964370 #>>43964770 #

8. jobigoud ◴[12 May 25 14:34 UTC] No.43963464[source]▶

>>43963168 #

We are talking about the rights of the humans training the models and the humans using the models to create new things.

Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.

replies(5): >>43963564 #>>43964130 #>>43964131 #>>43964631 #>>43965405 #

9. ulbu ◴[12 May 25 14:35 UTC] No.43963480[source]▶

>>43963168 #

these comparisons of llms with human artists copying are just ridiculous. it’s saying “well humans are allowed to break twigs and damage the planet in various ways, so why not allow building a fucking DEATH STAR”.

abstracting llms from their operators and owners and possible (and probable) ends and the territories they trample upon is nothing short of eye-popping to me. how utterly negligent and disrespectful of fellow people must one be at the heart to give any credence to such arguments

replies(3): >>43964105 #>>43964159 #>>43964449 #

10. jrajav ◴[12 May 25 14:38 UTC] No.43963509[source]▶

>>43963017 #

If you follow these cases more closely over time you'll find that they're less an example of humans stealing work from others and more an example of typical human greed and pride. Old, well established musicians arguing that younger musicians stole from them for using a chord progression used in dozens of songs before their own original, or a melody on the pentatonic scale that sounds like many melodies on the pentatonic scale do. It gets ridiculous.

Plus, all art is derivative in some sense, it's almost always just a matter of degree.

replies(2): >>43966745 #>>43969065 #

11. ceejayoz ◴[12 May 25 14:38 UTC] No.43963517[source]▶

>>43962976 (TP) #

> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

You're still not gonna be allowed to commercially publish "Hairy Plotter and the Philosophizer's Rock".

replies(2): >>43963660 #>>43966769 #

12. Workaccount2 ◴[12 May 25 14:41 UTC] No.43963560[source]▶

>>43963311 #

It's only complete non-sense if you understand how humans learn. Which we don't.

What we do know though is that LLMs, similar to humans, do not directly copy information into their "storage". LLMs, like humans, are pretty lossy with their recall.

Compare this to something like a search indexed database, where the recall of information given to it is perfect.

replies(1): >>43964910 #

13. toast0 ◴[12 May 25 14:41 UTC] No.43963561[source]▶

>>43963243 #

> Doing a cover song requires permission, and doing it without that permission can be illegal.

I believe cover song licensing is available mechanically; you don't need permission, you just need to follow the procedures including sending the licensing fees to a rights clearing house. Music has a lot of mechanical licenses and clearing houses, as opposed to other categories of works.

replies(1): >>43965692 #

14. MyOutfitIsVague ◴[12 May 25 14:41 UTC] No.43963564{3}[source]▶

>>43963464 #

It's not only publication, otherwise people wouldn't be able to be successfully sued for downloading and consuming copyrighted content, it would only be the uploaders who get into trouble.

replies(1): >>43963945 #

15. mjburgess ◴[12 May 25 14:46 UTC] No.43963629[source]▶

>>43963243 #

I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have. But go ahead, tare down the law to whatever inanity can be described by the trivial machines of the world's current popular charlatans. Presumably you weren't using society's presumption of your agency anyway.

replies(1): >>43965409 #

16. WesolyKubeczek ◴[12 May 25 14:48 UTC] No.43963660[source]▶

>>43963517 #

No, but you are most likely allowed to commercially publish "Hairy Potter and the Philosophizer's Rock", a story about a prehistoric community. The hero is literally a hairy potter who steals a rock from a lazy deadbeat dude who is pestering the rest of the group with his weird ideas.

replies(1): >>43964853 #

17. regularjack ◴[12 May 25 14:52 UTC] No.43963721[source]▶

>>43962976 (TP) #

Then they need to be changed for everyone and not just AI companies, but we all know that ain't happening.

18. jasonlotito ◴[12 May 25 15:08 UTC] No.43963908[source]▶

>>43963423 #

> But I don't understand how people jump to "Copyright Violation" for the fact of reading.

The article specificaly talks about the creation and distribution of a work. Creation and distribution of a work alone is not a copyright violation. However, if you take in input from something you don't own, and genAI outputs something, it could be considered a copyright violation.

Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.

> "People should not be allowed to read the book I distributed online if I don't want them to."

This is already done. It's been done for decades. See any case where content is locked behind an account. Only select people can view the content. The license to use the site limits who or what can use things.

So it's odd you would use "insane" to describe this.

> "People should not be allowed to write Harry Potter fanfic in my writing style."

Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it. Most cases of fan fiction are allowed because the author allows it. But no, generally, fan fiction is illegal. This is well known in the fan fiction community. Obviously, if you don't distribute it, that's fine. But we aren't talking about non-distribution cases here.

> "People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

Same with fan fiction. If you replicate a copyrighted piece of art, if you distribute it, that's illegal. If you simply do it for practice, that's fine. But no, if you go around replicating a painting and distribute it, that's illegal.

Of course, technically speaking, none of this is what gen AI models are doing.

> We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics

I agree. Personifying gen AI is useless. We should stick to the technical aspects of what it's doing, rather than trying to pretend it's doing human things when it's 100% not doing that in any capacity. I mean, that's fine for the the layman, but anyone with any ounce of technical skill knows that's not true.

replies(3): >>43964018 #>>43964393 #>>43964735 #

19. zelphirkalt ◴[12 May 25 15:11 UTC] No.43963943[source]▶

>>43962976 (TP) #

The law covers these cases pretty well, it is just that the law has very powerful extremely rich adversaries, whose greed has gotten the better of them again and again. They could use work released sufficiently long ago to be legally available, or they could take work released as creative commons, or they could run a lookup, to make sure to never output verbatim copies of input or outputs, that are within a certain string editing distance, depending on output length, or they could have paid people to reach out to all the people, whose work they are infringing upon. But they didn't do any of that, of course, because they think they are above the law.

replies(2): >>43964164 #>>43964374 #

20. HappMacDonald ◴[12 May 25 15:11 UTC] No.43963945{4}[source]▶

>>43963564 #

Do you have any links to cases where people were sued for downloading and consuming content without also uploading (eg, bittorent), hosting, sharing the copyrighted works, etc?

replies(2): >>43965951 #>>43966372 #

21. Aerroon ◴[12 May 25 15:18 UTC] No.43964018{3}[source]▶

>>43963908 #

>Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it.

Which is a clear failure of the copyright system. Millions of people are expanding our cultural artifacts with their own additions, but all of it is illegal, because they haven't waited another 100 years.

People are interested in these pieces of culture, but they're not going to remain interested in them forever. At least not interested enough to make their own contributions.

22. franczesko ◴[12 May 25 15:22 UTC] No.43964079[source]▶

>>43962976 (TP) #

> Piracy refers to the illegal act of copying, distributing, or using copyrighted material without authorization. It can occur in various forms

Professing of IP without a license AND offering it as a model for money doesn't seem like an unknown use-case to me

23. temporalparts ◴[12 May 25 15:26 UTC] No.43964105{3}[source]▶

>>43963480 #

The problem isn't that people aren't aware that the scale and magnitude differences are large and significant.

It's that the space of intellectual property LAW does not handle the robust capabilities of LLMs. Legislators NEED to pass laws to reflect the new realities or else all prior case law relies on human analogies which fail in the obvious ways you alluded to.

If there was no law governing the use of death stars and mass murder, and the only legal analogy is to environmental damage, then the only crime the legal system can ascribe is mass environmental damage.

replies(1): >>43964252 #

24. bgwalter ◴[12 May 25 15:28 UTC] No.43964130{3}[source]▶

>>43963464 #

Does the distinction matter? If humans build a machine that uses so much oxygen that the oxygen levels on earth drop by half, can they say:

"Humans are allowed to breathe, so our machine is too, because it is operated by humans!"

replies(1): >>43964279 #

25. spacemadness ◴[12 May 25 15:28 UTC] No.43964131{3}[source]▶

>>43963464 #

Sounds like we’re talking about the right of AI company founders and people on HN to acquire wealth from creative works due to some weak argument concerning similarity to the human mind and creation of art. Since we’ve now veered into armchair philosophy territory, I think one could argue that the way human memory works and creates, both physically and mentally, from inspiration is vastly different from how AI works. So saying they’re the same and that’s it is both lazy and takes interesting questions off the table to squash debate.

26. Intralexical ◴[12 May 25 15:30 UTC] No.43964159{3}[source]▶

>>43963480 #

It's a very consistently Silicon Valley mindset. Seems like almost every company that makes it big in tech, be it Facebook and Google monetizing our personal data, or Uber and Amazon trampling workers' rights, makes money by reducing people to objects that can be bought and sold, more than almost any other industry. No matter the company, all claimed prosocial intentions are just window dressing to convince us to be on board with our own commodification.

That's also why I'm really not worried about the "AI singularity" folks. The hype is IMO blatantly unsubstantiated by the actual capabilities, but gets pushed anyway only because it speaks to this deep-seated faith held across the industry. "AI" is the culmination of an innate belief that people should be replaceable, fungible, perfectly obedient objects, and such a psychosis blinds decision-makers to its actual limits. Only trouble is whether they have the political power to try to force it anyway.

replies(1): >>43967100 #

27. nadermx ◴[12 May 25 15:31 UTC] No.43964164[source]▶

>>43963943 #

I'm confused, so you're saying its illegal? Because last I checked it's still in the process of going through the courts. And need we forget that copyright's purpose is to advance the arts and sciences. Fair use is codified into law, which states each case is seen on a use by use basis, hence the litigation to determine if it is in fact, legal.

replies(1): >>43964357 #

28. Intralexical ◴[12 May 25 15:38 UTC] No.43964252{4}[source]▶

>>43964105 #

Why do you think the obvious analogy is LLM=Human, and not LLM=JPEG or LLM=database?

I think you're overstating the legal uniqueness of LLMs. They're covered just fine by the existing legal precedents around copyrighted and derived works, just as building a death star would be covered by existing rules around outer space use and WMDs. Pretending they should be treated differently is IMO the entire lie told by the "AI" companies about copyright.

replies(2): >>43964507 #>>43968544 #

29. TeMPOraL ◴[12 May 25 15:40 UTC] No.43964279{4}[source]▶

>>43964130 #

Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Point being, laws aren't some God-ordained rules, beautiful in their fractal recursive abstraction, perfectly covering everything that will ever happen in the universe. No, laws are more or less crude hacks that deal with here and now. Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI. This is a new situation, and laws need to be updated to cover it.

replies(1): >>43964747 #

30. ◴[12 May 25 15:40 UTC] No.43964280[source]▶

>>43962976 (TP) #

31. Intralexical ◴[12 May 25 15:42 UTC] No.43964305[source]▶

>>43963168 #

> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

The direction we're going, it seems more likely it'll be recycling to murder a human.

32. mdhb ◴[12 May 25 15:48 UTC] No.43964357{3}[source]▶

>>43964164 #

It’s so fucking obviously illegal when you think about it rationally for more than a few seconds. We aren’t even talking about “fair use” we are talking about how it works in practice which was Meta torrenting pirated books, never paying anyone a cent and straight up stealing the content at scale.

replies(2): >>43964700 #>>43964716 #

33. apercu ◴[12 May 25 15:48 UTC] No.43964365[source]▶

>>43962976 (TP) #

>Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

Corporations are not humans. (It's ridiculous that they have some legal protections in the US like humans, but that's a different issue). AI is also not human. AI is also not a chipmunk.

Why the comparison?

34. datavirtue ◴[12 May 25 15:49 UTC] No.43964370[source]▶

>>43963423 #

Exactly, it is an immense privilege to have your works preserved and promulgated through the ages for instant recall and automated publishing. It's literally what everyone wants. The creators and the consumers. The AI companies are not robbing your money or IP. Period.

35. ashoeafoot ◴[12 May 25 15:49 UTC] No.43964374[source]▶

>>43963943 #

Obviously a revenue tracking weight should be trained in allowing the tracking and collection of all values generated from derivative works.

36. datavirtue ◴[12 May 25 15:51 UTC] No.43964393{3}[source]▶

>>43963908 #

"However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to."

Absolute horse shit. I can start a 1-900 answer line and use any reference I want to answer your question.

replies(1): >>43964814 #

37. datavirtue ◴[12 May 25 15:55 UTC] No.43964441[source]▶

>>43963243 #

"If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem."

Why is that? Seems all logic gets thrown out the window when invoking AI around here. References are given. If the user publishes the output without attribution, NOW you have a problem. People are being so rabid and unreasonable here. Totally bat shit.

replies(1): >>43965672 #

38. SilasX ◴[12 May 25 15:56 UTC] No.43964448[source]▶

>>43962976 (TP) #

>My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Huh? If you agree that "learning from copyrighted works to make new ones" has traditionally not been considered infringement, then can you elaborate on why you think it fundamentally changes when you do it with bots? That would, if anything, seem to be a reversal of classic copyright jurisprudence. Up until 2022, pretty much everyone agreed that "learning from copyrighted works to make new ones" is exactly how it's supposed to work, and would be horrified at the idea of having to separately license that.

Sure, some fundamental dynamic might change when you do it with bots, but you need to make that case in an enforceable, operationalized way.

39. staticman2 ◴[12 May 25 15:56 UTC] No.43964449{3}[source]▶

>>43963480 #

> these comparisons of llms with human artists copying are just ridiculous.

I've come to think of this as the "Performatively failing to recognize the difference between an organism and a machine" rhetorical device that people employ here and elsewhere.

The person making the argument is capable of distinguishing the two things, they just performatively choose not to do so.

replies(1): >>43967517 #

40. datavirtue ◴[12 May 25 15:57 UTC] No.43964460[source]▶

>>43963311 #

And everyone here is downloading every show and movie in existence without even a hint of guilt.

replies(1): >>43968362 #

41. sdenton4 ◴[12 May 25 16:01 UTC] No.43964507{5}[source]▶

>>43964252 #

LLMs are certainly not a jpeg or a database...

The google news snippets case is, in my non-lawyer opinion, the most obvious touch point. And in that case, it was decided that providing large numbers of snippets in search results was non-infringing, despite being a case of copying text from other people at-scale... And the reasons this was decided are worth reading and internalizing.

There is not an obvious right answer here. Copyright rules are, in fact, Calvinball, and we're deep in uncharted territory.

replies(1): >>43964597 #

42. bitfilped ◴[12 May 25 16:06 UTC] No.43964562[source]▶

>>43962976 (TP) #

Sorry but AI isn't that useful and I don't see it becoming any more useful in the near term. It's taken since ~1950 to get LLMs working well enough to become popular and they still don't work well.

43. Intralexical ◴[12 May 25 16:09 UTC] No.43964597{6}[source]▶

>>43964507 #

> LLMs are certainly not a jpeg or a database...

Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material. And the output directly competes against the copyrighted source materials.

The fact they're smudgy and non-deterministic doesn't change how they relate to the rights of authors and artists.

replies(3): >>43964975 #>>43967423 #>>43967466 #

44. palmotea ◴[12 May 25 16:12 UTC] No.43964631{3}[source]▶

>>43963464 #

>>> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

>> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

> We are talking about the rights of the humans training the models and the humans using the models to create new things.

Then that's even easier, because that prevents appeals to things humans do, like learning, from muddying the waters.

If "training the models" entails loading up copyrighted works into your system (e.g. encoded them during training), you've just copied them into a retrieval system and violated copyright based on established precedent. And people have prompted verbatim copyrighted text out of well-known LLMs, which makes it even clearer.

And then to defend LLM training you're left with BS akin to claiming an ASCII encoded copy of a book not a copyright violation, because the book is paper and ASCII is numbers.

45. Intralexical ◴[12 May 25 16:18 UTC] No.43964700{4}[source]▶

>>43964357 #

A test to apply here: If you or I did this, would it be illegal? Would we even be having this conversation?

The law is supposed to be impartial. So if the answer is different, then it's not really a law problem we're talking about.

46. nadermx ◴[12 May 25 16:19 UTC] No.43964716{4}[source]▶

>>43964357 #

The fact you are even using the word stealing, is telling to your lack of knowledge in this field. Copyright infringement is not stealing[0]. The propaganda of the copyright cartel has gotten to you.

[0] https://en.wikipedia.org/wiki/Dowling_v._United_States_(1985...

replies(4): >>43965685 #>>43966032 #>>43969091 #>>43983233 #

47. vessenes ◴[12 May 25 16:21 UTC] No.43964735{3}[source]▶

>>43963908 #

> Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.

My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like. They sense and fear change. For instance here you say it's an issue when AI uses something as a source that you don't have Copyright to. Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue". What you said just isn't true. The copyright refers to the right to copy a work.

Distribution: Sure. License your content however you want. That said, in the US a license prohibiting you from READING something just wouldn't be possible. You can limit distribution, copying, etc. This is how journalists can write about sneak previews or leaked information or misfiled court documents released when they should be under seal. The leaking <-- the distribution might violate a contract or a license, but the reading thereof is really not a thing that US law or Common law think they have a right to control, except in the case of the state classifying secrets. As well, here we have people saying "my song in 1983 that I put out on the radio, I don't want AI listening to that song." Did your license in 1983 prohibit computers from processing your song? Does that mean digital radio can't send it out? Essentially that ship has sailed, full stop, without new legislation.

On my last points, I think you're missing my point, Fan fiction is legal if you're not trying to profit from it. It is almost impossible to perfectly copy a painting, although some people are pretty good at it. I think it's perfectly legal to paint a super close copy of say Starry Night, and sell it as "Starry night by Jason Lotito." In any event, the discourse right now claims its wrong for AI to look at and learn from paintings and photographs.

replies(1): >>43964908 #

48. palmotea ◴[12 May 25 16:22 UTC] No.43964747{5}[source]▶

>>43964279 #

> Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.

> Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI.

The laws are not "entirely ill-equipped to deal with generative AI," unless your interests lie in breaking them. All the hand-waving about the laws being "questionable" and "entirely ill-equipped" is just noise.

Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts. Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

replies(3): >>43965500 #>>43965515 #>>43967544 #

49. caconym_ ◴[12 May 25 16:23 UTC] No.43964770[source]▶

>>43963423 #

If it was as obvious as you claim, the legal issues would already be settled, and your characterization of what LLMs are doing as "reading and summarizing" is hilariously disingenuous and ignores essentially the entire substance of the debate (which is happening not just on internet forums but in real courts, where real legal professionals and scholars are grappling with how to fit AI into our framework of existing copyright law, e.g.^[1]).

Of course, if you start your thought by dismissing anybody who doesn't share your position as not sane, it's easy to see how you could fail to capture any of that.

^[1] https://arstechnica.com/tech-policy/2025/05/judge-on-metas-a...

50. jasonlotito ◴[12 May 25 16:28 UTC] No.43964814{4}[source]▶

>>43964393 #

> Absolute horse shit.

I agree, what followed was.

> I can start a 1-900 answer line and use any reference I want to answer your question

Yeah, that's not what we are talking about. If you think it was, you should probably do some more research on the topic.

51. zelphirkalt ◴[12 May 25 16:31 UTC] No.43964853{3}[source]▶

>>43963660 #

Not sure what you are getting at?

52. jasonlotito ◴[12 May 25 16:36 UTC] No.43964908{4}[source]▶

>>43964735 #

> My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like.

Your proposal is moving goal posts.

> Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue".

No, I never said that. Fair Use exists.

> Fan fiction is legal if you're not trying to profit from it.

No, it's not.[1] You can make arguments that it should be, but, no.

[1] https://jipel.law.nyu.edu/is-fanfiction-legal/

> I think you're missing my point

I think you got called out, and you are now trying to reframe your original comment so it comes across as having accounted for the things you were called out on.

You think you know what you are talking about, but you don't. But, you rely on the fact that you think you do to lose the money you do.

53. zelphirkalt ◴[12 May 25 16:36 UTC] No.43964910{3}[source]▶

>>43963560 #

Well, you don't get to pick and choose in which situations an LLM is considered similar to a human being and in which not. If you argue that it similarly to a human is lossy, well let's go ahead and get most output checked by organizations and courts for violations of the law and licenses, just like human work is. Oh wait, I forgot, LLMs are run by companies with too much cash to successfully sue them. I guess we just have to live with it then, what a pity.

replies(2): >>43965269 #>>43967046 #

54. SilasX ◴[12 May 25 16:43 UTC] No.43964975{7}[source]▶

>>43964597 #

The problem is, you can say all of that for human learning-from-copyrighted-works, so that point isn't definitive.

replies(2): >>43967554 #>>43969144 #

55. gruez ◴[12 May 25 16:53 UTC] No.43965072[source]▶

>>43963168 #

>The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

That might be true but I don't see how it's relevant. There's no provision in copyright law that gives a free pass to humans vs machines, or makes a distinction between them.

replies(1): >>43965379 #

56. philipkglass ◴[12 May 25 17:11 UTC] No.43965269{4}[source]▶

>>43964910 #

There are a couple of ways to theoretically prevent copyright violations in output. For closed models that aren't distributed as weights, companies could index perceptual hashes of all the training data at a granular level (like individual paragraphs of text) and check/retry output so that no duplicates or near-duplicates of copyrighted training data ever get served as a response to end users.

Another way would be to train an internal model directly on published works, use that model to generate a corpus of sanitary rewritten/reformatted data about the works still under copyright, then use the sanitized corpus to train a final model. For example, the sanitized corpus might describe the Harry Potter books in minute detail but not contain a single sentence taken from the originals. Models trained that way wouldn't be able to reproduce excerpts from Harry Potter books even if the models were distributed as open weights.

57. moralestapia ◴[12 May 25 17:20 UTC] No.43965379{3}[source]▶

>>43965072 #

In the case of Copyright law, no provision means it will fall in "forbidden" land, not in "allowed" land.

Also in general, grey areas don't mean those things are legal.

Edit: this remains true even if you don't like it, ¯\_(ツ)_/¯.

replies(1): >>43965497 #

58. moralestapia ◴[12 May 25 17:25 UTC] No.43965405{3}[source]▶

>>43963464 #

>Copyright only comes into play on publication.

Nope.

You have a right to not publish any work that you own. This is protected by Copyright law.

59. stevenAthompson ◴[12 May 25 17:25 UTC] No.43965409{3}[source]▶

>>43963629 #

> I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have.

Society doesn't need to measure my mind, they need to measure the output of it. If I behave like a conscious being, I am a conscious being. Alternatively you might phrase it such that "Anything that claims to be conscious must be assumed to be conscious."

It's the only answer to the p-zombie problem that makes sense. None of this is new, philosophers have been debating it for ages. See: https://en.wikipedia.org/wiki/Philosophical_zombie

However, for copyright purposes we can make it even simpler. If the work is new, it's not covered by the original copyright. If it is substantially the same, it isn't. Forget the arguments about the ghost in the machine and the philosophical mumbo-jumbo. It's the output that matters.

replies(1): >>43965699 #

60. gruez ◴[12 May 25 17:33 UTC] No.43965497{4}[source]▶

>>43965379 #

>In the case of Copyright law, no provision means it will fall in "forbidden" land, not in "allowed" land.

AI companies claim it falls under fair use. Pirates use the same excuse too. Just look at all the clips uploaded to youtube with a "it's fair use guys!" note in the description. The only difference between the two is that the former is novel enough that there's plausible arguments for both sides, and the latter has been so thoroughly litigated that you'd be laughed out of the courtroom for claiming that your torrenting falls under fai ruse.

replies(1): >>43965695 #

61. TeMPOraL ◴[12 May 25 17:33 UTC] No.43965500{6}[source]▶

>>43964747 #

> Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.

That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

> Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case. I'm not begrudging them for raking in heaps of money offering generative AI services, because they're legitimately offering value that's at least commensurate (IMHO it's much greater) to what they charge, and that value comes entirely from the work they're uniquely able to do, and any individual work that went into training data contributes approximately zero to it.

(GenAI doesn't rely on any individual work in training data; it relies on the breadth and amount being a notable fraction of humanity's total intellectual output. It so happens that almost all knowledge and culture is subject to copyright, so you couldn't really get to this without stepping on some legal landmines.)

(Also, much like AI companies would like the law to favor them, their opponents in this case would like the law to dictate they should be compensated for their works being used in training data, but compensated way beyond any value their works bring in, which in reality is, again, approximately zero.)

replies(1): >>43966813 #

62. ben_w ◴[12 May 25 17:34 UTC] No.43965515{6}[source]▶

>>43964747 #

> Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.

Copyright laws were themselves created by the printing press making it easy to duplicate works, whereas previously if you half-remembered something that was just "inspiration".

But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

Now we have AI which are "inspired" (perhaps) by what they read, and half-remember it, in a way that seems similar to pre-printing-press humans sharing stories even if the mechanism is different.

How this is seen according to current law likely varies by jurisdiction; but the law as it is today matters less than what the law will be when the new ones are drafted to account for GenAI.

What that will look like, I am unsure. Could be that for training purposes, copyright becomes eternal… but it's also possible that copyright may cease to exist entirely — laws to protect the entire creative industry may seem good, but if AI displaces all humans from economic activity, will it continue to matter?

replies(2): >>43965733 #>>43966984 #

63. stevenAthompson ◴[12 May 25 17:46 UTC] No.43965672{3}[source]▶

>>43964441 #

> If the user publishes the output without attribution, NOW you have a problem.

I didn't meant to imply that the AI can't quote Shakespeare in Context, just that it shouldn't try to pass off Shakespeare as it's own or plagiarize huge swathes of the source text.

> People are being so rabid and unreasonable here.

People here are more reasonable than average. Wait until mainstream society starts to really feel the impact of all this.

64. ◴[12 May 25 17:47 UTC] No.43965685{5}[source]▶

>>43964716 #

65. stevenAthompson ◴[12 May 25 17:48 UTC] No.43965692{3}[source]▶

>>43963561 #

> you don't need permission, you just need to follow the procedures

Those procedures are how you ask for permission. As you say, it usually involves a fee but doesn't have to.

replies(1): >>43966650 #

66. moralestapia ◴[12 May 25 17:48 UTC] No.43965695{5}[source]▶

>>43965497 #

Agree. It feels a bit like earlier days in Bitcoin world. Eventually the courts decided how it was going to be and people like CZ had to pay a visit to jail, but there is now clear jurisdiction on that.

The same will happen with AI, no one will go to jail but perhaps it is ruled out that LLMs infringe copyright.

(Same thing happened in the early days of YouTube as well, the solution was stuff like MusicDNA, etc...)

67. mjburgess ◴[12 May 25 17:48 UTC] No.43965699{4}[source]▶

>>43965409 #

In your case, it isnt the output that matters. Your saying "I'm conscious" isn't why we attribute consciousness to you. We would do so regardless of your ability to verbalise anything in particular.

Your radical behaviourism seems an advantage to you when you want to delete one disfavoured part of copyright law, but I assure you, it isn't in your interest. It doesnt universalise well at all. You do not want to be defined by how you happen to verbalise anything, unmoored from your intention, goals, and so on.

The law, and society, imparts much to you that is never measured and much that is unmeasurable. What can be measured is, at least, extremely ambiguous with respect to those mental states which are being attributed. Because we do not attribute mental states by what people say -- this plays very little role (consider what a mess this would make of watching movies). And none of course in the large number of animals which share relevant mental states.

Nothing of relevance is measured by an LLM's output. It is highly unambigious: the LLM has no mental states, and thus is irrelevant to the law, morality, society and everything else.

It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort. Hype for LLMs does not lead to a credible theory of minds.

replies(1): >>43966504 #

68. Jensson ◴[12 May 25 17:51 UTC] No.43965733{7}[source]▶

>>43965515 #

> But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

That is even worse without copyright, as then every previous work would be free and you would have to compete with better works that are also free for people.

replies(1): >>43967536 #

69. dns_snek ◴[12 May 25 17:56 UTC] No.43965792[source]▶

>>43962976 (TP) #

The problem with this kind of analysis is that it doesn't even try to address the reasons why copyright exists in the first place. This belief that training LLMs on content without permission should be allowed is incompatible with the belief that copyright is useful, you really have to pick a lane here.

Go back to the roots of copyright and the answers should be obvious. According to the US constitution, copyright exists "To promote the Progress of Science and useful Arts" and according to the EU, "Copyright ensures that authors, composers, artists, film makers and other creators receive recognition, payment and protection for their works. It rewards creativity and stimulates investment in the creative sector."

If I publish a book and tech companies are allowed to copy it, use it for "training", and later regurgitate the knowledge contained within to their customers then those people have no reason to buy my book. It is a market substitute even though it might not be considered such under our current copyright law. If that is allowed to happen then investment will stop and these books simply won't get written anymore.

70. hochstenbach ◴[12 May 25 18:10 UTC] No.43965920[source]▶

>>43962976 (TP) #

Humans are not allowed to do what AI firms want to do. That was one of the copyright office arguments: a student can't just walk into a library and say "I want a copy of all your books, because I need them for learning".

Humans are also very useful and transformative.

71. MyOutfitIsVague ◴[12 May 25 18:12 UTC] No.43965951{5}[source]▶

>>43963945 #

There were the famous napster cases, the kids and old ladies that got sued by the RIAA for using limewire to download some music.

There is also the fact that copyright holders will pressure your ISP into sending threatening letters and shutting off your Internet for piracy, even without you seeding. I haven't gotten the impression that you are in the clear for pirating as long as you don't distribute.

72. hulitu ◴[12 May 25 18:20 UTC] No.43966032{5}[source]▶

>>43964716 #

> The fact you are even using the word stealing, is telling to your lack of knowledge in this field.

I agree. If you can pay the judge, the congress or the president, it is definitely not stealing. It is (the best) democracy (money can buy). /s

replies(1): >>43966238 #

73. nadermx ◴[12 May 25 18:42 UTC] No.43966238{6}[source]▶

>>43966032 #

So when someone steals something from you, you no longer have it. Yet here they paid the judge(s) because the person who's been "robbed" still has their thing?

74. lavezzi ◴[12 May 25 18:57 UTC] No.43966372{5}[source]▶

>>43963945 #

There's tonnes, this is a baffling question.

75. stevenAthompson ◴[12 May 25 19:09 UTC] No.43966504{5}[source]▶

>>43965699 #

> We would do so regardless of your ability to verbalise anything in particular

I don't mean to say that they literally have to speak the words by using their meat to make the air vibrate. Just that, presuming it has some physical means, it be capable (and willing) to express it in some way.

> It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort.

I appreciate why you might feel that way. However, I feel it's far worse to pretend we have some undetectable magic within us that allows us to perceive the "realness" of others peoples consciousness by other than physical means.

Fundamentally, you seem to be arguing that something with outputs identical to a human is not human (or even human like), and should not be viewed within the same framework. Do you see how dangerous an idea that is? It is only a short hop from "Humans are different than robots, because of subjective magic" to "Humans are different than <insert race you don't like>, because of subjective magic."

76. toast0 ◴[12 May 25 19:25 UTC] No.43966650{4}[source]▶

>>43965692 #

(in the US) Mechanical licenses are compulsory; you don't need permission, you can just follow the forms and pay the fees set by the Copyright Royalty Board (appointed by the Librarian of Congress). You can ask the rightsholder to negotiate a lower fee, but there's no need for consent of the rightsholder if you notify as required (within 30 days of recording and before distribution) and pay the set fees.

replies(1): >>43967107 #

77. FireBeyond ◴[12 May 25 19:37 UTC] No.43966745{3}[source]▶

>>43963509 #

To the point that Billy Joel "famously" credited the songwriter for one of his songs ("This Night") as "Billy Joel, Ludwig van Beethoven".

78. anigbrowl ◴[12 May 25 19:40 UTC] No.43966769[source]▶

>>43963517 #

You are if it's parody, cf 'Bored of the Rings'.

79. palmotea ◴[12 May 25 19:48 UTC] No.43966813{7}[source]▶

>>43965500 #

> That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

It's still copyright infringement if I download a pirated movie and never watch it (writing the bytes to the disk == "training" the disk's "model", reading the bytes back == "generating" from the disk's "model").

> That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case.

IMHO, unless you're massively wealthy and/or running a bigcorp, people like you benefit a lot more from copyright than are harmed by it. In a world without copyright protection, some bigcorp will be able to use its size to extract the value from the works that are out there (i.e. Amazon and Netflix will stop paying royalties instantly, but they'll still have customers because they have the scale to distribute). Copyright just means the little guy who's actually creating has some claim to get some of the value directed back to them.

> and any individual work that went into training data contributes approximately zero to it.

Then cut all those works out of the training set. I don't think it's an excuse that the infringement has to happen on a massive scale to be of value to the generative AI company.

80. palmotea ◴[12 May 25 20:09 UTC] No.43966984{7}[source]▶

>>43965515 #

> Copyright laws were themselves created by the printing press making it easy to duplicate works, whereas previously if you half-remembered something that was just "inspiration".

Eh. I don't know the history, but my understanding was they were created because the printing press allowed others to deny the original creators the profits to their work, and direct those profits to others who had no hand in it.

After all, in market terms: a publisher that pays its authors can't compete with another that publisher that publishes the same works but without paying any authors. A word without copyright is one where some publisher still makes money, but it's a race to the bottom for authors.

Here's a hole in your thinking: if you like fantasy, would you be content to just re-read Tolkien over and over, forever? Don't you think that'd get boring no matter how good he was?

And empirically, "new creative [people]" manage to complete with Pratchett or Tolkien all the time, as new fantasy works are still being published and read. Do you remember that "Game of Thrones" was a mass cultural phenomenon not too long ago?

replies(1): >>43967772 #

81. Workaccount2 ◴[12 May 25 20:18 UTC] No.43967046{4}[source]▶

>>43964910 #

Youtube built probably the most complex and proactive copyright system any organization has ever seen, for the sole purpose of appeasing copyright holders. There is no reason to believe they won't do the same thing for LLM output.

82. palmotea ◴[12 May 25 20:25 UTC] No.43967100{4}[source]▶

>>43964159 #

> That's also why I'm really not worried about the "AI singularity" folks. The hype is IMO blatantly unsubstantiated by the actual capabilities, but gets pushed anyway only because it speaks to this deep-seated faith held across the industry. "AI" is the culmination of an innate belief that people should be replaceable, fungible, perfectly obedient objects, and such a psychosis blinds decision-makers to its actual limits. Only trouble is whether they have the political power to try to force it anyway.

I'm worried because decision-makers genuinely don't seem to be bothered very much by actual capabilities, and are perfectly happy to trade massive reductions in quality for cost savings. In other worse, I don't think the limits of LLMS will actually constrain the decision-makers.

replies(1): >>43969158 #

83. stevenAthompson ◴[12 May 25 20:25 UTC] No.43967107{5}[source]▶

>>43966650 #

Thanks for clarifying. Sometimes I forget that HN has a lot experts floating around who take things in a very literal and legalistic way. I was speaking in more general terms, and missed that you were being very precise with your language.

Compulsory licenses are interesting aren't they? It just feels wrong. If Metallica doesn't want me to butcher their songs, why should the be forced to allow it?

replies(2): >>43967433 #>>43967596 #

84. sdenton4 ◴[12 May 25 21:01 UTC] No.43967423{7}[source]▶

>>43964597 #

Nothing in copyright law talks about 'semantic meaning' or 'character of the source material'. Really, quite the opposite - the 'expression-idea dichotomy' says that you're copyrighting the expression of an idea, not the idea itself. https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...

(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)

And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.

85. skolskoly ◴[12 May 25 21:03 UTC] No.43967433{6}[source]▶

>>43967107 #

Any live band performing a song is subject to mechanical licensing as much as a recording artist. Typically the venue pays it, just like how radio stations pay royalties. This system exists because historically, that's how music reproduction worked. You hire some musicians to play the music you want to hear. Copyright applied to the score, the lyrics, and so on. The 'mechanical' rights had to come later, because recording hadn't been invented yet!

86. Suppafly ◴[12 May 25 21:05 UTC] No.43967443[source]▶

>>43963168 #

>The fatal flaw in your reasoning: machines aren't humans.

I don't see how that affects the argument. The machines are being used by humans. Your argument then boils down to the idea that you can do something manually but it becomes illegal if you use a tool to do it efficiently.

replies(1): >>43967533 #

87. Suppafly ◴[12 May 25 21:08 UTC] No.43967466{7}[source]▶

>>43964597 #

>Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material.

That sounds like you're arguing that they should be legal. Copyright law protects specific expressions, not handwavy "smudgy and non-deterministic" things.

replies(1): >>43969125 #

88. Suppafly ◴[12 May 25 21:13 UTC] No.43967517{4}[source]▶

>>43964449 #

>The person making the argument is capable of distinguishing the two things, they just performatively choose not to do so.

I think that sort of assumption of insincerity is worse than what you're accusing them of. You might not like their argument, but it's not inherently incorrect for them to argue that because humans have the right to do something, humans have the right to use tools to do that something and humans have the right to group together and use those tools to do something at a large scale.

replies(1): >>43973795 #

89. const_cast ◴[12 May 25 21:15 UTC] No.43967533{3}[source]▶

>>43967443 #

It's not about the tool, how you use it, or even how it works. It's about the end result.

I can go through and manually compress "Revenge of the Sith" and then post it online. Or, I can use a compression program like handbrake. Regardless, it is copyright infringement.

Can AI reproduce almost* the same things that exist in it's training data? Sometimes, so sometimes it's copyright infringement. Doesn't help that it's explicitly for-profit and seeks to obsolesce and siphon value from it's training material.

replies(1): >>43967637 #

90. Suppafly ◴[12 May 25 21:16 UTC] No.43967536{8}[source]▶

>>43965733 #

>that are also free for people

sounds like a good deal if you're people.

91. Suppafly ◴[12 May 25 21:17 UTC] No.43967544{6}[source]▶

>>43964747 #

>Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.

If it were that cut and dried we wouldn't have this conversation at all, so clearly your position isn't objectively true.

92. const_cast ◴[12 May 25 21:18 UTC] No.43967554{8}[source]▶

>>43964975 #

The difference is we're humans, so we get special privileges. We made the laws.

If we're going to be giving some rights to LLMs for convenient for-profit ventures, I expect some in-depth analysis on whether that is or is not slavery. You can't just anthropomorphize a computer program when it makes you money but then conveniently ignore the hundreds of years of development of human rights. If that seems silly, then I think LLMs are probably not like humans and the comparisons to human learning aren't justified.

If it's like a human, that makes things very complicated.

93. toast0 ◴[12 May 25 21:24 UTC] No.43967596{6}[source]▶

>>43967107 #

They are very interesting. IMHO, it's a nice compromise between making sure the artists are paid for their work, and giving them complete control over their work. Licensing for radio-style play is also compulsory, and terrestrial radio used to not even have to pay the recording artists (I think this changed?), but did have to track and pay to ASCAP.

As a consumer, it would amazing if there were compulsory licenses for film and tv; then we wouldn't have to subscribe to 70 different services to get to the things we want to see. And there would likely be services that spring up to redistribute media where the rightsholders aren't able to or don't care to; it might be pulled from VHS that fans recorded off of TV in the old days, but at least it'd be something.

94. Suppafly ◴[12 May 25 21:29 UTC] No.43967637{4}[source]▶

>>43967533 #

>Sometimes, so sometimes it's copyright infringement.

So in those cases, the original authors might have a case. Generally you don't see these LLM doing that though.

>Doesn't help that it's explicitly for-profit and seeks to obsolesce and siphon value from it's training material.

Doesn't hurt either. That's a reason to be butthurt, but that's not a legal argument.

replies(1): >>43967723 #

95. const_cast ◴[12 May 25 21:40 UTC] No.43967723{5}[source]▶

>>43967637 #

> That's a reason to be butthurt, but that's not a legal argument.

It is a legal argument, fair use specifically takes into account the intention. Just using it for commercial ventures makes the water hotter.

replies(1): >>43976273 #

96. ben_w ◴[12 May 25 21:45 UTC] No.43967772{8}[source]▶

>>43966984 #

> A word without copyright is one where some publisher still makes money, but it's a race to the bottom for authors.

This is the case anyway; there are many writers competing for the opportunity to be published, so the publishers have a massive advantage, and it is the technology of printing (and cheap paper) that makes this a one-sided relationship — if every story teller had to be heard in person, with no recordings or reproductions possible, then story tellers would be found in every community, and they would be valued by their community.

> Here's a hole in your thinking: if you like fantasy, would you be content to just re-read Tolkien over and over, forever? Don't you think that'd get boring no matter how good he was?

The examples aren't meant to be exclusive, and Pratchett has a lot of books.

There's far more books on the market right now than a human can read in a lifetime. At some point, we may have already passed it, there will be far more good books on the market than a human can read in a lifetime, at which point it's not quality, it's fashion.

> And empirically, "new creative [people]" manage to complete with Pratchett or Tolkien all the time, as new fantasy works are still being published and read.

At some point, there will be more books at least as good as Pratchett, Tolkien, Le Guin, McCaffrey, Martin, Heinlein, Niven etc. in each genre, than anyone can read.

> Do you remember that "Game of Thrones" was a mass cultural phenomenon not too long ago?

Published: August 1, 1996 — concurrently with Pratchett.

Better example would have been The Expanse — worth noting that SciFi has a natural advantage over (high) fantasy or romance, as the nature of speculative science fiction means it keeps considering futures that are rendered as obsolete as the worn-down buttons on the calculator that Hari Seldon was rumoured to keep under his pillow.

97. encipriano ◴[12 May 25 23:20 UTC] No.43968362{3}[source]▶

>>43964460 #

Why would u have guilt of using an unlimited resource? Youre not stealing

98. kbelder ◴[12 May 25 23:55 UTC] No.43968544{5}[source]▶

>>43964252 #

If they were a database, they would be unquestionably legal, because they're only storing a tiny fraction of one percent of the data from any document, and even that data is not any particular replica of any part of the document, but highly summarized and transformed.

replies(1): >>43969148 #

99. johnnyanmac ◴[13 May 25 01:54 UTC] No.43969065{3}[source]▶

>>43963509 #

> art is derivative in some sense, it's almost always just a matter of degree.

Yes, that's why we judge on a case by case basis. The line is blurry.

I think when you're storing copies of such assets in your database that you're well past the line, though.

100. johnnyanmac ◴[13 May 25 02:00 UTC] No.43969091{5}[source]▶

>>43964716 #

> Copyright infringement is not stealing

If we can agree that taking away of your time is theft (wage theft, to be precise), we as those who rely on intellect in our careers should be able to agree that the taking of our ideas is also theft.

>moved to the Ninth Circuit Court of Appeals, where he argued that the goods he was distributing were not "stolen, converted or taken by fraud", according to the language of 18 U.S.C. 2314 - the interstate transportation statute under which he was convicted. The court disagreed, affirming the original decision and upholding the conviction. Dowling then took the case to the Supreme Court, which sided with his argument and reversed the convictions.

This just tells me that the definition is highly contentious. Having the supreme court reverse a federal ruling already shows misalignment.

101. johnnyanmac ◴[13 May 25 02:09 UTC] No.43969125{8}[source]▶

>>43967466 #

Llms can't express, that's the primary issue. You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".

replies(2): >>43976226 #>>43976269 #

102. johnnyanmac ◴[13 May 25 02:13 UTC] No.43969144{8}[source]▶

>>43964975 #

Scales of effect always come into play when enacting law. If you spend a day digging a whole on the beach, you're probably not going to incur much wrath. If you bring a crane to the beach, you'll be stopped because we know the hole that can be made will disrupt the natural order. A human can do the same thing eventually, but does it so slowly that it's not an issue to enforce 99.9% of the time.

replies(1): >>43969886 #

103. johnnyanmac ◴[13 May 25 02:14 UTC] No.43969148{6}[source]▶

>>43968544 #

Given that you can in fact prompt enough to reproduce a source image, I'm not convinced that is the actual truth of the matter.

104. johnnyanmac ◴[13 May 25 02:17 UTC] No.43969158{5}[source]▶

>>43967100 #

It will when it inevitably hits their wallets. Be it via the public rejection of a lower quality product, or court orders. But both sentiments move slow, so we're in here for a while.

Even with NFTs it still was a full year+ of everyone trying to shill them out before the sentiment turned. Machine learning, meanwhile, is actually useful but is being shoved into every hole.

105. SilasX ◴[13 May 25 05:41 UTC] No.43969886{9}[source]▶

>>43969144 #

That's just the usual hand-wavy, vague "it's different" argument. If you want to justify treating the cases differently based on a fundamental difference, you need to be more specific. For example, they usually define an amount of rainwater you can collect that's short of disrupting major water flows.

So what is the equivalent of "digging too much" in a beach for AI? What fundamentally changes when you learn hyper-fast vs just read a bunch of horror novels to inform better horror novel-writing? What's unfair about AI compared to learning from published novels about how to properly pace your story?

These are the things you need to figure out before making a post equating AI learning with copyright infringement. "It's different" doesn't cut it.

106. staticman2 ◴[13 May 25 15:09 UTC] No.43973795{5}[source]▶

>>43967517 #

Anyone writing "humans can learn from art why can't machines" or something to that effect is performatively conflating an organism and a machine.

My issue is with the rhetoric, if that isn't the rhetoric you are using I am not talking about you.

replies(1): >>43976263 #

107. sdenton4 ◴[13 May 25 18:43 UTC] No.43976226{9}[source]▶

>>43969125 #

That's certainly an opinion.

108. Suppafly ◴[13 May 25 18:46 UTC] No.43976263{6}[source]▶

>>43973795 #

My issue is that your rhetoric of "performatively conflating an organism and a machine" doesn't address the core issue of "humans can learn from art why can't machines". You're essentially saying that you don't like the question so you're refusing to answer it. There is nothing inherently wrong with training machines on existing data, if you want us to believe there is, you need to have some argument about what that would be the case.

Is your argument simply about your interpretation of copyright law and your mentality being that laws are good and breaking them is bad? Because that doesn't seem to be a very informed position to take.

replies(1): >>43976804 #

109. Suppafly ◴[13 May 25 18:46 UTC] No.43976269{9}[source]▶

>>43969125 #

>You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".

And yet collage artists do that all the time.

replies(1): >>43982167 #

110. Suppafly ◴[13 May 25 18:47 UTC] No.43976273{6}[source]▶

>>43967723 #

>It is a legal argument

Not a very good one then.

111. p0w3n3d ◴[13 May 25 19:33 UTC] No.43976732[source]▶

>>43962976 (TP) #

it's funny how a law becomes potentially-outdated only when big corporations want to violate in on a global scale.

As a private person I no longer feel incentivised to create new content online because I think that all I create will eventually be stolen from me...

112. staticman2 ◴[13 May 25 19:38 UTC] No.43976804{7}[source]▶

>>43976263 #

My stated opinion is anyone who comes to an AI conversation and says "I can't tell the difference between organisms and computers" or some variation thereof does in fact have no trouble in practice distinguishing between between their child/ mom/ dad/ BFF and ChatGPT as is in fact questioning from a position of bad faith.

"There is nothing inherently wrong with training machines on existing data..." doesn't really conflate a machine with an organism and isn't what I'm talking about.

If you instead had written "I can read the Cat in the Hat to teach my kid to read why can't I use it to train an LLM?"

Then I do think you would be asking with a certain degree of bad faith, you are perfectly capable of distinguishing those two things, in practice, in your everyday life. You do not in fact see them as equivilent.

Your rhetorical choice to be unable to tell the difference would be performative.

You seem to think I'm arguing copyright policy. I really am discussing rhetoric.

113. johnnyanmac ◴[14 May 25 08:16 UTC] No.43982167{10}[source]▶

>>43976269 #

I'll remind you that all fanart is technically in a gray area of copyright infringement. Legally speaking, companies can take down and charge infringement for anything using their IP thars not under fair use. Collages don't really pass that benchmark.

Yoinnking their up and mass producing slop sure is a line to cross, though.

replies(1): >>43984849 #

114. zelphirkalt ◴[14 May 25 11:33 UTC] No.43983233{5}[source]▶

>>43964716 #

I still feel like the point is useless, because at the end of the day, if some normal person went ahead and did the same thing the tech giant did, they would long be moved to a less comfortable new home, that has high security against breaking in. At the end of the day, the situation now is, that some are more equal than others, and it is unacceptable, yet, due to the mountains of (also unethically acquired) cash they have, they can get away with something a normal person cannot. Even the law might be bent to their will, because if suing them fails, it creates precedence.

If we end up saying it is not illegal, then I demand, that it will not be illegal for everyone. No double standards please. Let us all launder copyrighted material this way, labeling it "AI".

115. temporalparts ◴[14 May 25 14:17 UTC] No.43984849{11}[source]▶

>>43982167 #

I'm not an expert, but I thought fan art that people try to monetize in some form is explicitly illegal unless it's protected by parody, and any non commercial "violations" of copyright is totally legal. Disney can't stop me from drawing Mickey in the privacy of my own house, just monetizing/getting famous off of them.

↑