US Copyright Office found AI companies breach copyright. Its boss was fired

(www.theregister.com)

452 points croes | 2 comments | 12 May 25 09:49 UTC | HN request time: 0s | source

Show context

mattxxx ◴[12 May 25 13:58 UTC] No.43962976[source]▶

Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

replies(19): >>43963017 #>>43963125 #>>43963168 #>>43963214 #>>43963243 #>>43963311 #>>43963423 #>>43963517 #>>43963612 #>>43963721 #>>43963943 #>>43964079 #>>43964280 #>>43964365 #>>43964448 #>>43964562 #>>43965792 #>>43965920 #>>43976732 #

palmotea[dead post] ◴[12 May 25 14:14 UTC] No.43963168[source]▶

>>43962976 #

[flagged]

jobigoud ◴[12 May 25 14:34 UTC] No.43963464[source]▶

>>43963168 #

We are talking about the rights of the humans training the models and the humans using the models to create new things.

Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.

replies(5): >>43963564 #>>43964130 #>>43964131 #>>43964631 #>>43965405 #

bgwalter ◴[12 May 25 15:28 UTC] No.43964130{3}[source]▶

>>43963464 #

Does the distinction matter? If humans build a machine that uses so much oxygen that the oxygen levels on earth drop by half, can they say:

"Humans are allowed to breathe, so our machine is too, because it is operated by humans!"

replies(1): >>43964279 #

TeMPOraL ◴[12 May 25 15:40 UTC] No.43964279{4}[source]▶

>>43964130 #

Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Point being, laws aren't some God-ordained rules, beautiful in their fractal recursive abstraction, perfectly covering everything that will ever happen in the universe. No, laws are more or less crude hacks that deal with here and now. Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI. This is a new situation, and laws need to be updated to cover it.

replies(1): >>43964747 #

palmotea ◴[12 May 25 16:22 UTC] No.43964747{5}[source]▶

>>43964279 #

> Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.

> Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI.

The laws are not "entirely ill-equipped to deal with generative AI," unless your interests lie in breaking them. All the hand-waving about the laws being "questionable" and "entirely ill-equipped" is just noise.

Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts. Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

replies(3): >>43965500 #>>43965515 #>>43967544 #

1. TeMPOraL ◴[12 May 25 17:33 UTC] No.43965500{6}[source]▶

>>43964747 #

> Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.

That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

> Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case. I'm not begrudging them for raking in heaps of money offering generative AI services, because they're legitimately offering value that's at least commensurate (IMHO it's much greater) to what they charge, and that value comes entirely from the work they're uniquely able to do, and any individual work that went into training data contributes approximately zero to it.

(GenAI doesn't rely on any individual work in training data; it relies on the breadth and amount being a notable fraction of humanity's total intellectual output. It so happens that almost all knowledge and culture is subject to copyright, so you couldn't really get to this without stepping on some legal landmines.)

(Also, much like AI companies would like the law to favor them, their opponents in this case would like the law to dictate they should be compensated for their works being used in training data, but compensated way beyond any value their works bring in, which in reality is, again, approximately zero.)

replies(1): >>43966813 #

2. palmotea ◴[12 May 25 19:48 UTC] No.43966813[source]▶

>>43965500 (TP) #

> That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

It's still copyright infringement if I download a pirated movie and never watch it (writing the bytes to the disk == "training" the disk's "model", reading the bytes back == "generating" from the disk's "model").

> That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case.

IMHO, unless you're massively wealthy and/or running a bigcorp, people like you benefit a lot more from copyright than are harmed by it. In a world without copyright protection, some bigcorp will be able to use its size to extract the value from the works that are out there (i.e. Amazon and Netflix will stop paying royalties instantly, but they'll still have customers because they have the scale to distribute). Copyright just means the little guy who's actually creating has some claim to get some of the value directed back to them.

> and any individual work that went into training data contributes approximately zero to it.

Then cut all those works out of the training set. I don't think it's an excuse that the infringement has to happen on a massive scale to be of value to the generative AI company.

↑