Not having sensible people steering copyright in a direction toward winding down its scope is being paired with a court that's likely to make it far more draconian, and create some massive problems that will be a problem for software development.
The reality of course is more complicated. Without copyright there's no GPL. Which I guess is fine if you're in the OSS camp more than the FSF camp. MIT and BSD licenses basically (functionally) give up copyright.
Copyright is also what allows for hybrids like the BSL which protect "little guys" from large cloud providers like AWS etc.
Copyright allows VC startups to at least start out life as Open Source (before pivoting later.)
Of course thus is all in the context of software copyright. Other copyrights (music, books etc) are equally nuanced.
And there are other forms of IP protections as well (patents, trademarks) which are distinct from the copyright concept.
So no, I don't think most people here are against copyright (patents are a different story.)
If I was to guess, I would imagine most on here believe in some copyright, and not total anarchy.
2. I generally don't like the BSL.
3. No comment. I think OSS projects that exist incidentally versus being the company's main product have always been more reliable (and less susceptible to the company pivoting to closed-only offerings).
4. Copyright has perhaps been the most evil in the music industry; books, less so. I'd rather not even talk about movies or TV right now. Nonetheless, I'd tolerate an extremely limited duration copyright, if no copyright at all isn't an option.
5. Trademarks are mostly fine, because they're primarily supposed to serve customers, not the companies. I'd like to get rid of patents now, however.
Copyright in its current form is ridiculous, but I support some (much-pared-back) version of copyright that limits rights further, expands fair use, repeals the DMCA, and reduces the copyright term to something on the order of 15-20 years (perhaps with a renewal option as with patents).
I've released a lot of software under the GPL, and the GPL in its current form couldn't exist without copyright.
It would be nice of FOSS was the baseline, but I don't see that ever happening, especially in a world without an enforcement mechanism.
What copyright should do is protect individual creators, not corporations. And it should protect them even if their work is mixed through complex statistical algorithms such as LLMs.
LLMs wouldn't be possible without _trillions_ of hours of work by people writing books, code, music, etc. they are trained on. The _millions_ of hours of work spent on the training algorithm itself, the chat interface, the scraping scripts, etc. is barely a drop in the bucket.
There is 0 reason the people who spent mere millions of hours of work should get all the reward without giving anything to the rest of the world who put in trillions of hours.
Your point remains, but the problem of the division of responsibility and financial credit doesn't go away with that alone. Do you know if the openAI lawsuits have laid this out?
Sure having source code would be nice, but then again half the software nowadays is using electron and written in javascript anyway. Also plenty of examples of hardware manufacturers using software/firmware copyright as excuse and making legal threats to people who have made their own software to control hardware they bought even though they didn't have access to original source code.
There are probably more examples of people reverse engineering an reimplementing or decompiling large nontrivial software than there examples of companies making their whole software open source due to using a GPL licensed library (as opposed to avoiding the GPL licensed code or violating the GPL by not releasing the source code).
Does not mean that GPL is ineffective. IT forces them to reimplement the functionality, thus giving copyleft more time to compete with them. Imagine if they were to free to take all public code and just use it. They would always be ahead and open source products wouldn't stand a chance competing.
Not to mention I feel like GPL being so strong is why big companies pretend to love open source but permissive licenses so much - to drown out the GPL competition they hate so much and to attract more developers to permissive rather than copyleft open source projects.
This is extend-and-extinguish on rails. Raise capital, hire a team to fork a public project, develop is closed and only release inscrutable blobs. Add a marketing budget and you get to piggyback on the open-source project while keeping the monetisation.
With code, some licenses are compatible, for example you could take a model trained on GPL and MIT code, and use it to produce GPL code. (The resulting model would _of course_ also be a derivative work licensed under the GPL.) That satisfies the biggest elephant in the room - giving users their rights to inspect and modify the code. Giving credit to individual authors is more difficult though.
I haven't been following the lawsuits much, I am powerless to influence them and having written my fair share of GPL and AGPL code, this whole LLM thing feels like being spat in the face.
It's not only about regurgitation verbatim. Doing that just means it gets caught more easily.
LLMs are just another way the uber rich try to exploit everyone, hoping that if they exploit every single person's work just a little, they will get away with it.
Nobody is 1000x more productive than the average programmer at writing code. There is no reason somebody should make 1000x more money from it either.
Obfuscation techniques. Compatibility updates. Hell, hardware-enforced DRM.
This isn't really how derivative works operate.
If you read Harry Potter and you decide you want to write a book about how Harry and Hermione grow up and become professors at Hogwarts, that's probably going to be a derivative work.
If you read Harry Potter and decide you want to write a book about a little Korean girl who lives with abusive parents but has a knack for science and crawls her way out of that family by inventing things for an eccentric businessman, is that a derivative of Harry Potter? Probably not, even if that was the inspiration for it.
To be a derivative work it has to be pretty similar to the original. That's actually the test, it's based on similarity. Causing it to not be one is done exactly by mixing it with so many other things that it's no longer sufficiently like any of them.
It can be as simple as "you cannot train on someone's work for commercial uses without a license", It can be as complex as setting up some sort of model like Spotify based on the numbers of time the LLM references those works for what it's generating. The devil's in the details, but the problem itself isn't new.
>Dividing equal share based on inputs would require the company to potentially expose proprietary information.
I find this defense ironic, given the fact that a lot of this debate revolves around defining copyright infringement. The works being trained on are infringed upon, but we might give too many details about the tech used to siphon all these IP's? Tragic.
>Do you know if the openAI lawsuits have laid this out?
IANAL, but my understanding of high profile cases is going more towards the "you can't train on this" litigation over the "how do we setup a payment model" sorts. If that's correct, we're pretty far out from considering that.
Do they work against Red Hat or Intel or Google or Mozilla, once those organizations can openly distribute the reconstructed code they've assigned full-time people to decompile? For that matter, what stops any government from doing it to any foreign company?
Which hardware company is going to build your DRM if there is no law you can use to stop the same company from also selling circumvention tools, or stopping anyone else (including major corporations) from extracting keys and selling them openly?
Implementation: yes, that should be protected. People seem to not like that here, though.
Funnily enough, this idea that the method matters is part of what separates Trump's supporters from sane people.
- how things work now vs how they should work - and also how it works when a human does something vs when a an LLM is used to generate something imitating the human work.
A human has limited time and memory. Human time is valuable, computer time is not. Memorizing something by a human takes time.
When a human is inspired by a work and writes something based on that, he invests a lot of time and energy into it. Therefore people have decided that this creative output should be protected by the law.
Also a human is limited by how much he can remember from the original work. Even if writing what you described, he would inevitably fall back on his own life experiences, opinions, attitude, ways of thinking, etc.
When an LLM is used, it generated a statistical mashup of works it ingested during training. No part of this process has any intrinsic value. It literally only costs what the electricity does. And it's almost infinitely scalable. The law might not call it derivative because it was written at a time where this kind of mechanical derivation was not feasible.
BTW, I like that you spell it GAI. General artificial intelligence feels more natural to say. I wonder if there's some rule of english I don't know which makes AGI more correct or if all the highly educated people are just trying to avoid sounding like they're saying "gay".
> Which hardware company is going to build your DRM
The ones that build the software. Apple. Oracle.
MIT/BSD is like putting pristine steel out in the rain. Rust will get to it before long. GPL is like painting it to protect it for generations to come.
Better hope you never have a single vulnerability or someone's going to post it on the internet.
And then you'd have to put it on your servers, not AWS or some other third party with no obligation not to release it or start using it themselves. Meanwhile the open source people have no secrets and can use a commodity CDN or let the users run it locally.
> The ones that build the software. Apple. Oracle.
For DRM that would attempt to prevent you from running it, that doesn't help. People would install iOS on Samsung phones and Oracle's database on third party commodity hardware.
For DRM that would attempt to prevent anyone from making a single copy of the software, when has that worked even today when breaking it is illegal? Meanwhile if it's not illegal you would have to contend with multinational corporations with full on clean rooms and state of the art equipment and if any of them can extract a single key from a single device that's it.
Only if the technology stagnates (or other conditions where a capital advantage proves useless). If it’s a moving target you don’t need any of this. Just basic security measures the likes of which protect the source code of most closed-source software today from everyone but the likes of a handful of nation states.
> People would install iOS on Samsung phones and Oracle's database on third party commodity hardware
Some people might. Most wouldn’t. Certainly not the vast majority of the people willing to pay for apps and hence the market for devs.
But they are still based on the training data. An untrained model is a random noise generator. A model trained exclusively on GPL code will therefore obviously only generate useful code thanks to the GPL input. The output is literally derived from the "training data" input and the prompt.
Now, given the input is a much more substantial than the prompt by orders of magnitude, the prompt is basically irrelevant.
So what the license of the output should be based on is the training data. The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input. It's just manipulation.
That's assuming both that defending requires the same level of resources as attacking and that the company trying to keep the changes a secret has a capital advantage.
The current attempts to do these things are performed by multibillion dollar corporations and then cracked by individual teenagers, and now you're adding the likes of Google, Amazon, Facebook, Samsung, etc. to the list of attackers. Apple is currently slightly bigger than any one of them but certainly not all of them put together and all of the teenagers and foreign governments in the world.
> Some people might. Most wouldn’t. Certainly not the vast majority of the people willing to pay for apps and hence the market for devs.
Samsung is a large conglomerate with in-house experience making modern electronics down to having their own fabs and making their own CPUs and flash. They would hand an iPhone to their techs, tell them to extract the code and then offer it as a checkbox to install iOS when you buy one of their phones. Nobody would choose that over the same phone with Android?
For that matter Apple would be attempting to use the same system to stop people from copying apps, but then competitors would do the same thing there, and then who is buying an iPhone that tries to charge you for apps that everyone else has extracted and made available for free?
An obvious practical problem with this is that the licenses are variously incompatible with one another:
https://en.wikipedia.org/wiki/License_compatibility
> The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input.
Whether it's an intelligent entity or not doesn't really enter into it. The real question is whether the output is taking enough from some particular input to make it a derivative. Which ought to depend on what a given output actually looks like.