Most active commenters

johnnyanmac(4)
sdenton4(3)

Popular/hot comments

>>43964597 #

←back to thread

US Copyright Office found AI companies breach copyright. Its boss was fired

(www.theregister.com)

Show context

mattxxx ◴[12 May 25 13:58 UTC] No.43962976[source]▶

>>43961247 (OP) #

Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

replies(19): >>43963017 #>>43963125 #>>43963168 #>>43963214 #>>43963243 #>>43963311 #>>43963423 #>>43963517 #>>43963612 #>>43963721 #>>43963943 #>>43964079 #>>43964280 #>>43964365 #>>43964448 #>>43964562 #>>43965792 #>>43965920 #>>43976732 #

palmotea[dead post] ◴[12 May 25 14:14 UTC] No.43963168[source]▶

>>43962976 #

[flagged]

ulbu ◴[12 May 25 14:35 UTC] No.43963480[source]▶

>>43963168 #

these comparisons of llms with human artists copying are just ridiculous. it’s saying “well humans are allowed to break twigs and damage the planet in various ways, so why not allow building a fucking DEATH STAR”.

abstracting llms from their operators and owners and possible (and probable) ends and the territories they trample upon is nothing short of eye-popping to me. how utterly negligent and disrespectful of fellow people must one be at the heart to give any credence to such arguments

replies(3): >>43964105 #>>43964159 #>>43964449 #

temporalparts ◴[12 May 25 15:26 UTC] No.43964105[source]▶

>>43963480 #

The problem isn't that people aren't aware that the scale and magnitude differences are large and significant.

It's that the space of intellectual property LAW does not handle the robust capabilities of LLMs. Legislators NEED to pass laws to reflect the new realities or else all prior case law relies on human analogies which fail in the obvious ways you alluded to.

If there was no law governing the use of death stars and mass murder, and the only legal analogy is to environmental damage, then the only crime the legal system can ascribe is mass environmental damage.

replies(1): >>43964252 #

1. Intralexical ◴[12 May 25 15:38 UTC] No.43964252[source]▶

>>43964105 #

Why do you think the obvious analogy is LLM=Human, and not LLM=JPEG or LLM=database?

I think you're overstating the legal uniqueness of LLMs. They're covered just fine by the existing legal precedents around copyrighted and derived works, just as building a death star would be covered by existing rules around outer space use and WMDs. Pretending they should be treated differently is IMO the entire lie told by the "AI" companies about copyright.

replies(2): >>43964507 #>>43968544 #

2. sdenton4 ◴[12 May 25 16:01 UTC] No.43964507[source]▶

>>43964252 (TP) #

LLMs are certainly not a jpeg or a database...

The google news snippets case is, in my non-lawyer opinion, the most obvious touch point. And in that case, it was decided that providing large numbers of snippets in search results was non-infringing, despite being a case of copying text from other people at-scale... And the reasons this was decided are worth reading and internalizing.

There is not an obvious right answer here. Copyright rules are, in fact, Calvinball, and we're deep in uncharted territory.

replies(1): >>43964597 #

3. Intralexical ◴[12 May 25 16:09 UTC] No.43964597[source]▶

>>43964507 #

> LLMs are certainly not a jpeg or a database...

Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material. And the output directly competes against the copyrighted source materials.

The fact they're smudgy and non-deterministic doesn't change how they relate to the rights of authors and artists.

replies(3): >>43964975 #>>43967423 #>>43967466 #

4. SilasX ◴[12 May 25 16:43 UTC] No.43964975{3}[source]▶

>>43964597 #

The problem is, you can say all of that for human learning-from-copyrighted-works, so that point isn't definitive.

replies(2): >>43967554 #>>43969144 #

5. sdenton4 ◴[12 May 25 21:01 UTC] No.43967423{3}[source]▶

>>43964597 #

Nothing in copyright law talks about 'semantic meaning' or 'character of the source material'. Really, quite the opposite - the 'expression-idea dichotomy' says that you're copyrighting the expression of an idea, not the idea itself. https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...

(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)

And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.

6. Suppafly ◴[12 May 25 21:08 UTC] No.43967466{3}[source]▶

>>43964597 #

>Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material.

That sounds like you're arguing that they should be legal. Copyright law protects specific expressions, not handwavy "smudgy and non-deterministic" things.

replies(1): >>43969125 #

7. const_cast ◴[12 May 25 21:18 UTC] No.43967554{4}[source]▶

>>43964975 #

The difference is we're humans, so we get special privileges. We made the laws.

If we're going to be giving some rights to LLMs for convenient for-profit ventures, I expect some in-depth analysis on whether that is or is not slavery. You can't just anthropomorphize a computer program when it makes you money but then conveniently ignore the hundreds of years of development of human rights. If that seems silly, then I think LLMs are probably not like humans and the comparisons to human learning aren't justified.

If it's like a human, that makes things very complicated.

8. kbelder ◴[12 May 25 23:55 UTC] No.43968544[source]▶

>>43964252 (TP) #

If they were a database, they would be unquestionably legal, because they're only storing a tiny fraction of one percent of the data from any document, and even that data is not any particular replica of any part of the document, but highly summarized and transformed.

replies(1): >>43969148 #

9. johnnyanmac ◴[13 May 25 02:09 UTC] No.43969125{4}[source]▶

>>43967466 #

Llms can't express, that's the primary issue. You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".

replies(2): >>43976226 #>>43976269 #

10. johnnyanmac ◴[13 May 25 02:13 UTC] No.43969144{4}[source]▶

>>43964975 #

Scales of effect always come into play when enacting law. If you spend a day digging a whole on the beach, you're probably not going to incur much wrath. If you bring a crane to the beach, you'll be stopped because we know the hole that can be made will disrupt the natural order. A human can do the same thing eventually, but does it so slowly that it's not an issue to enforce 99.9% of the time.

replies(1): >>43969886 #

11. johnnyanmac ◴[13 May 25 02:14 UTC] No.43969148[source]▶

>>43968544 #

Given that you can in fact prompt enough to reproduce a source image, I'm not convinced that is the actual truth of the matter.

12. SilasX ◴[13 May 25 05:41 UTC] No.43969886{5}[source]▶

>>43969144 #

That's just the usual hand-wavy, vague "it's different" argument. If you want to justify treating the cases differently based on a fundamental difference, you need to be more specific. For example, they usually define an amount of rainwater you can collect that's short of disrupting major water flows.

So what is the equivalent of "digging too much" in a beach for AI? What fundamentally changes when you learn hyper-fast vs just read a bunch of horror novels to inform better horror novel-writing? What's unfair about AI compared to learning from published novels about how to properly pace your story?

These are the things you need to figure out before making a post equating AI learning with copyright infringement. "It's different" doesn't cut it.

13. sdenton4 ◴[13 May 25 18:43 UTC] No.43976226{5}[source]▶

>>43969125 #

That's certainly an opinion.

14. Suppafly ◴[13 May 25 18:46 UTC] No.43976269{5}[source]▶

>>43969125 #

>You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".

And yet collage artists do that all the time.

replies(1): >>43982167 #

15. johnnyanmac ◴[14 May 25 08:16 UTC] No.43982167{6}[source]▶

>>43976269 #

I'll remind you that all fanart is technically in a gray area of copyright infringement. Legally speaking, companies can take down and charge infringement for anything using their IP thars not under fair use. Collages don't really pass that benchmark.

Yoinnking their up and mass producing slop sure is a line to cross, though.

replies(1): >>43984849 #

16. temporalparts ◴[14 May 25 14:17 UTC] No.43984849{7}[source]▶

>>43982167 #

I'm not an expert, but I thought fan art that people try to monetize in some form is explicitly illegal unless it's protected by parody, and any non commercial "violations" of copyright is totally legal. Disney can't stop me from drawing Mickey in the privacy of my own house, just monetizing/getting famous off of them.

↑