←back to thread

451 points croes | 1 comments | | HN request time: 0.212s | source
Show context
mattxxx ◴[] No.43962976[source]
Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

replies(19): >>43963017 #>>43963125 #>>43963168 #>>43963214 #>>43963243 #>>43963311 #>>43963423 #>>43963517 #>>43963612 #>>43963721 #>>43963943 #>>43964079 #>>43964280 #>>43964365 #>>43964448 #>>43964562 #>>43965792 #>>43965920 #>>43976732 #
palmotea[dead post] ◴[] No.43963168[source]
[flagged]
jobigoud ◴[] No.43963464[source]
We are talking about the rights of the humans training the models and the humans using the models to create new things.

Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.

replies(5): >>43963564 #>>43964130 #>>43964131 #>>43964631 #>>43965405 #
1. palmotea ◴[] No.43964631[source]
>>> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

>> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

> We are talking about the rights of the humans training the models and the humans using the models to create new things.

Then that's even easier, because that prevents appeals to things humans do, like learning, from muddying the waters.

If "training the models" entails loading up copyrighted works into your system (e.g. encoded them during training), you've just copied them into a retrieval system and violated copyright based on established precedent. And people have prompted verbatim copyrighted text out of well-known LLMs, which makes it even clearer.

And then to defend LLM training you're left with BS akin to claiming an ASCII encoded copy of a book not a copyright violation, because the book is paper and ASCII is numbers.