←back to thread

451 points croes | 10 comments | | HN request time: 1.175s | source | bottom
Show context
mattxxx ◴[] No.43962976[source]
Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

replies(19): >>43963017 #>>43963125 #>>43963168 #>>43963214 #>>43963243 #>>43963311 #>>43963423 #>>43963517 #>>43963612 #>>43963721 #>>43963943 #>>43964079 #>>43964280 #>>43964365 #>>43964448 #>>43964562 #>>43965792 #>>43965920 #>>43976732 #
palmotea[dead post] ◴[] No.43963168[source]
[flagged]
jobigoud ◴[] No.43963464[source]
We are talking about the rights of the humans training the models and the humans using the models to create new things.

Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.

replies(5): >>43963564 #>>43964130 #>>43964131 #>>43964631 #>>43965405 #
bgwalter ◴[] No.43964130[source]
Does the distinction matter? If humans build a machine that uses so much oxygen that the oxygen levels on earth drop by half, can they say:

"Humans are allowed to breathe, so our machine is too, because it is operated by humans!"

replies(1): >>43964279 #
1. TeMPOraL ◴[] No.43964279[source]
Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Point being, laws aren't some God-ordained rules, beautiful in their fractal recursive abstraction, perfectly covering everything that will ever happen in the universe. No, laws are more or less crude hacks that deal with here and now. Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI. This is a new situation, and laws need to be updated to cover it.

replies(1): >>43964747 #
2. palmotea ◴[] No.43964747[source]
> Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.

> Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI.

The laws are not "entirely ill-equipped to deal with generative AI," unless your interests lie in breaking them. All the hand-waving about the laws being "questionable" and "entirely ill-equipped" is just noise.

Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts. Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

replies(3): >>43965500 #>>43965515 #>>43967544 #
3. TeMPOraL ◴[] No.43965500[source]
> Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.

That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

> Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case. I'm not begrudging them for raking in heaps of money offering generative AI services, because they're legitimately offering value that's at least commensurate (IMHO it's much greater) to what they charge, and that value comes entirely from the work they're uniquely able to do, and any individual work that went into training data contributes approximately zero to it.

(GenAI doesn't rely on any individual work in training data; it relies on the breadth and amount being a notable fraction of humanity's total intellectual output. It so happens that almost all knowledge and culture is subject to copyright, so you couldn't really get to this without stepping on some legal landmines.)

(Also, much like AI companies would like the law to favor them, their opponents in this case would like the law to dictate they should be compensated for their works being used in training data, but compensated way beyond any value their works bring in, which in reality is, again, approximately zero.)

replies(1): >>43966813 #
4. ben_w ◴[] No.43965515[source]
> Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.

Copyright laws were themselves created by the printing press making it easy to duplicate works, whereas previously if you half-remembered something that was just "inspiration".

But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

Now we have AI which are "inspired" (perhaps) by what they read, and half-remember it, in a way that seems similar to pre-printing-press humans sharing stories even if the mechanism is different.

How this is seen according to current law likely varies by jurisdiction; but the law as it is today matters less than what the law will be when the new ones are drafted to account for GenAI.

What that will look like, I am unsure. Could be that for training purposes, copyright becomes eternal… but it's also possible that copyright may cease to exist entirely — laws to protect the entire creative industry may seem good, but if AI displaces all humans from economic activity, will it continue to matter?

replies(2): >>43965733 #>>43966984 #
5. Jensson ◴[] No.43965733{3}[source]
> But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

That is even worse without copyright, as then every previous work would be free and you would have to compete with better works that are also free for people.

replies(1): >>43967536 #
6. palmotea ◴[] No.43966813{3}[source]
> That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

It's still copyright infringement if I download a pirated movie and never watch it (writing the bytes to the disk == "training" the disk's "model", reading the bytes back == "generating" from the disk's "model").

> That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case.

IMHO, unless you're massively wealthy and/or running a bigcorp, people like you benefit a lot more from copyright than are harmed by it. In a world without copyright protection, some bigcorp will be able to use its size to extract the value from the works that are out there (i.e. Amazon and Netflix will stop paying royalties instantly, but they'll still have customers because they have the scale to distribute). Copyright just means the little guy who's actually creating has some claim to get some of the value directed back to them.

> and any individual work that went into training data contributes approximately zero to it.

Then cut all those works out of the training set. I don't think it's an excuse that the infringement has to happen on a massive scale to be of value to the generative AI company.

7. palmotea ◴[] No.43966984{3}[source]
> Copyright laws were themselves created by the printing press making it easy to duplicate works, whereas previously if you half-remembered something that was just "inspiration".

Eh. I don't know the history, but my understanding was they were created because the printing press allowed others to deny the original creators the profits to their work, and direct those profits to others who had no hand in it.

After all, in market terms: a publisher that pays its authors can't compete with another that publisher that publishes the same works but without paying any authors. A word without copyright is one where some publisher still makes money, but it's a race to the bottom for authors.

> But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

Here's a hole in your thinking: if you like fantasy, would you be content to just re-read Tolkien over and over, forever? Don't you think that'd get boring no matter how good he was?

And empirically, "new creative [people]" manage to complete with Pratchett or Tolkien all the time, as new fantasy works are still being published and read. Do you remember that "Game of Thrones" was a mass cultural phenomenon not too long ago?

replies(1): >>43967772 #
8. Suppafly ◴[] No.43967536{4}[source]
>that are also free for people

sounds like a good deal if you're people.

9. Suppafly ◴[] No.43967544[source]
>Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.

If it were that cut and dried we wouldn't have this conversation at all, so clearly your position isn't objectively true.

10. ben_w ◴[] No.43967772{4}[source]
> A word without copyright is one where some publisher still makes money, but it's a race to the bottom for authors.

This is the case anyway; there are many writers competing for the opportunity to be published, so the publishers have a massive advantage, and it is the technology of printing (and cheap paper) that makes this a one-sided relationship — if every story teller had to be heard in person, with no recordings or reproductions possible, then story tellers would be found in every community, and they would be valued by their community.

> Here's a hole in your thinking: if you like fantasy, would you be content to just re-read Tolkien over and over, forever? Don't you think that'd get boring no matter how good he was?

The examples aren't meant to be exclusive, and Pratchett has a lot of books.

There's far more books on the market right now than a human can read in a lifetime. At some point, we may have already passed it, there will be far more good books on the market than a human can read in a lifetime, at which point it's not quality, it's fashion.

> And empirically, "new creative [people]" manage to complete with Pratchett or Tolkien all the time, as new fantasy works are still being published and read.

At some point, there will be more books at least as good as Pratchett, Tolkien, Le Guin, McCaffrey, Martin, Heinlein, Niven etc. in each genre, than anyone can read.

> Do you remember that "Game of Thrones" was a mass cultural phenomenon not too long ago?

Published: August 1, 1996 — concurrently with Pratchett.

Better example would have been The Expanse — worth noting that SciFi has a natural advantage over (high) fantasy or romance, as the nature of speculative science fiction means it keeps considering futures that are rendered as obsolete as the worn-down buttons on the calculator that Hari Seldon was rumoured to keep under his pillow.