Also please don't use word "learning", use "creating software using copyrighted materials".
Also let's think together how can we prevent AI companies from using our work using technical measures if the law doesn't work?
Also please don't use word "learning", use "creating software using copyrighted materials".
Also let's think together how can we prevent AI companies from using our work using technical measures if the law doesn't work?
The whole point of copyright is to ensure you're paid for your work. AI companies shouldn't pirate, but if they pay for your work, they should be able to use it however they please, including training an LLM on it.
If that LLM reproduces your work, then the AI company is violating copyright, but if the LLM doesn't reproduce your work, then you have not been harmed. Trying to claim harm when you haven't been due to some philosophical difference in opinion with the AI company is an abuse of the courts.
I could agree with exceptions for non-commercial activity like scientific research, but AI companies are made for extracting profits and not for doing research.
> AI companies shouldn't pirate, but if they pay for your work, they should be able to use it however they please, including training an LLM on it.
It doesn't work this way. If you buy a movie it doesn't mean you can sell goods with movie characters.
> then you have not been harmed.
I am harmed because less people will buy the book if they can simply get an answer from LLM. Less people will hire me to write code if an LLM trained on my code can do it. Maybe instead of books we should start making applications that protect the content and do not allow copying text or making screenshots. ANd instead of open-source code we should provide binary WASM modules.
And the harm you describe is not a recognized harm. You don't own information, you own creative works in their entirety. If your work is simply a reference, then the fact being referenced isn't something you own, thus you are not harmed if that fact is shared elsewhere.
It is an abuse of the courts to attempt to prevent people who have purchased your works from using those works to train an LLM. It's morally wrong.
No. The point of copyright is that the author gets to decide under what terms their works are copied. That's the essence of copyright. In many cases, authors will happily sell you a copy of their work, but they're under no obligation to do so. They can claim a copyright and then never release their work to the general public. That's perfectly within their rights, and they can sue to stop anybody from distributing copies.
When you do it for a transformative purpose (turning it into an LLM model) it's certainly fair use.
But more importantly, it's ethical to do so, as the agreement you've made with the person you've purchased the book from included permission to do exactly that.
If the author didn't want their work to be included in an LLM, they should not have sold it, just like if an author didn't want their work to inspire someone else's work, they should not have sold it.
People don't view moral issues in the abstract.
A better perspective on this is the fact that human individuals have created works which megacorps are training on for free or for the price of a single book and creating models which replace individuals.
The megacorps are only partially replacing individuals now, but when the models get good enough they could replace humans entirely.
When such a future happens will you still be siding with them or with individual creators?
People already use pirated software for product creation.
Hypothetical:
I know a guy who learned photoshop on a pirated copy of Photoshop. He went on to be a graphic designer. All his earnings are ‘proceeds from crime’
He never used the pirated software to produce content.
If that were the case then this court case would not be ongoing
Those damn kind readers and libraries. Giving their single copy away when they just paid for the single.
Reasonable minds could debate the ethics of how the material was used, this ruling judged the usage was legal and fair use. The only problem is the material was in effect stolen.
Sorry for the long quote, but basically this, yeah. A major point of free software is that creators should not have the power to impose arbitrary limits on the users of their works. It is unethical.
It's why the GPL allows the user to disregard any additional conditions, why it's viral, and why the FSF spends so much effort on fighting "open source but..." licenses.
I could, right now in just a few minutes, go download a perfectly functional pirated copy of nearly any Adobe program, nearly any Microsoft program and a whole range of books and movies, yet I see zero real financial troubles affecting any of the companies behind these. All the contrary in fact.
You are allowed to buy and scan books, and then used those scanned books to create products. I guess you are also allowed to pirate books and use the knowledge to create products if you are willing to pay the damages to the rights holders for copyright violations.
If you watch a YouTube video to learn something and it's later taken down for using copyrighted images, you learned from illegal content.