No One Is in Charge at the US Copyright Office

(www.wired.com)

Show context

mslansn[dead post] ◴[28 Jun 25 18:50 UTC] No.44407124[source]▶

>>44406400 (OP) #

[flagged]

kelnos ◴[28 Jun 25 20:07 UTC] No.44407742[source]▶

>>44407124 #

"This website" is a diverse bunch of people with diverse goals and policy positions. Please don't make generalizations.

Copyright in its current form is ridiculous, but I support some (much-pared-back) version of copyright that limits rights further, expands fair use, repeals the DMCA, and reduces the copyright term to something on the order of 15-20 years (perhaps with a renewal option as with patents).

I've released a lot of software under the GPL, and the GPL in its current form couldn't exist without copyright.

replies(2): >>44407909 #>>44408174 #

martin-t ◴[28 Jun 25 21:01 UTC] No.44408174[source]▶

>>44407742 #

Current copyright is too strong in terms of length but too weak in terms of derived work. Well, pending some lawsuits, perhaps.

What copyright should do is protect individual creators, not corporations. And it should protect them even if their work is mixed through complex statistical algorithms such as LLMs.

LLMs wouldn't be possible without _trillions_ of hours of work by people writing books, code, music, etc. they are trained on. The _millions_ of hours of work spent on the training algorithm itself, the chat interface, the scraping scripts, etc. is barely a drop in the bucket.

There is 0 reason the people who spent mere millions of hours of work should get all the reward without giving anything to the rest of the world who put in trillions of hours.

replies(2): >>44408229 #>>44408427 #

1. logicchains ◴[28 Jun 25 21:47 UTC] No.44408427[source]▶

>>44408174 #

Your approach will be completely untenable in future when we'll have embodied LLMs capable of dynamically learning (live weight updates). It'd make it illegal for such a machine to read any book, watch any movie or browse any webpage, because it could potentially memorise and regurgitate the content. Which would be completely impossible to enforce.

replies(3): >>44408557 #>>44408570 #>>44409110 #

2. martin-t ◴[28 Jun 25 22:07 UTC] No.44408557[source]▶

>>44408427 (TP) #

Please, don't anthropomorphize it. A model does not "read" a book - an algorithm updates weights which are _based on_ (therefore derivative work) existing training data. Basing them on more work performed by other people does not make it less derivative.

It's not only about regurgitation verbatim. Doing that just means it gets caught more easily.

LLMs are just another way the uber rich try to exploit everyone, hoping that if they exploit every single person's work just a little, they will get away with it.

Nobody is 1000x more productive than the average programmer at writing code. There is no reason somebody should make 1000x more money from it either.

replies(1): >>44409011 #

3. ◴[28 Jun 25 22:09 UTC] No.44408570[source]▶

>>44408427 (TP) #

4. AnthonyMouse ◴[28 Jun 25 23:24 UTC] No.44409011[source]▶

>>44408557 #

> A model does not "read" a book - an algorithm updates weights which are _based on_ (therefore derivative work) existing training data. Basing them on more work performed by other people does not make it less derivative.

This isn't really how derivative works operate.

If you read Harry Potter and you decide you want to write a book about how Harry and Hermione grow up and become professors at Hogwarts, that's probably going to be a derivative work.

If you read Harry Potter and decide you want to write a book about a little Korean girl who lives with abusive parents but has a knack for science and crawls her way out of that family by inventing things for an eccentric businessman, is that a derivative of Harry Potter? Probably not, even if that was the inspiration for it.

To be a derivative work it has to be pretty similar to the original. That's actually the test, it's based on similarity. Causing it to not be one is done exactly by mixing it with so many other things that it's no longer sufficiently like any of them.

replies(1): >>44409322 #

5. johnnyanmac ◴[28 Jun 25 23:40 UTC] No.44409110[source]▶

>>44408427 (TP) #

LLMs are stalled out in progress as is. We can burn that bridge of GAI when it becomes more than a twinkle in the eyes of investor board meetings.

replies(1): >>44409336 #

6. martin-t ◴[29 Jun 25 00:28 UTC] No.44409322{3}[source]▶

>>44409011 #

Let's differentiate

- how things work now vs how they should work - and also how it works when a human does something vs when a an LLM is used to generate something imitating the human work.

A human has limited time and memory. Human time is valuable, computer time is not. Memorizing something by a human takes time.

When a human is inspired by a work and writes something based on that, he invests a lot of time and energy into it. Therefore people have decided that this creative output should be protected by the law.

Also a human is limited by how much he can remember from the original work. Even if writing what you described, he would inevitably fall back on his own life experiences, opinions, attitude, ways of thinking, etc.

When an LLM is used, it generated a statistical mashup of works it ingested during training. No part of this process has any intrinsic value. It literally only costs what the electricity does. And it's almost infinitely scalable. The law might not call it derivative because it was written at a time where this kind of mechanical derivation was not feasible.

replies(1): >>44411314 #

7. martin-t ◴[29 Jun 25 00:30 UTC] No.44409336[source]▶

>>44409110 #

Given how it's going, I hope you're right.

BTW, I like that you spell it GAI. General artificial intelligence feels more natural to say. I wonder if there's some rule of english I don't know which makes AGI more correct or if all the highly educated people are just trying to avoid sounding like they're saying "gay".

8. AnthonyMouse ◴[29 Jun 25 08:33 UTC] No.44411314{4}[source]▶

>>44409322 #

At this point you're making the case for AI-generated works not being copyrightable rather than for regarding them as derivative works.

replies(1): >>44414655 #

9. martin-t ◴[29 Jun 25 17:11 UTC] No.44414655{5}[source]▶

>>44411314 #

They probably should not by copyrightable by the person prompting the model, at least not to the full extend of normal copyright.

But they are still based on the training data. An untrained model is a random noise generator. A model trained exclusively on GPL code will therefore obviously only generate useful code thanks to the GPL input. The output is literally derived from the "training data" input and the prompt.

Now, given the input is a much more substantial than the prompt by orders of magnitude, the prompt is basically irrelevant.

So what the license of the output should be based on is the training data. The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input. It's just manipulation.

replies(1): >>44416853 #

10. AnthonyMouse ◴[29 Jun 25 21:58 UTC] No.44416853{6}[source]▶

>>44414655 #

> So what the license of the output should be based on is the training data.

An obvious practical problem with this is that the licenses are variously incompatible with one another:

https://en.wikipedia.org/wiki/License_compatibility

> The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input.

Whether it's an intelligent entity or not doesn't really enter into it. The real question is whether the output is taking enough from some particular input to make it a derivative. Which ought to depend on what a given output actually looks like.

↑