US Copyright Office found AI companies breach copyright. Its boss was fired

(www.theregister.com)

452 points croes | 1 comments | 12 May 25 09:49 UTC | HN request time: 0s | source

Show context

jhaile ◴[12 May 25 15:48 UTC] No.43964361[source]▶

One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit. This will mean that US LLM companies will either fall behind or be too expensive. Which means China and other countries will probably surge ahead in AI, at least in terms of how useful the AI is.

That is not to say that we shouldn't do the right thing regardless, but I do think there is a feeling of "who is going to rule the world in the future?" tha underlies governmental decision-making on how much to regulate AI.

replies(10): >>43964511 #>>43964513 #>>43964544 #>>43964546 #>>43964647 #>>43964799 #>>43965877 #>>43966756 #>>43969913 #>>43974233 #

Bjorkbat ◴[12 May 25 16:26 UTC] No.43964799[source]▶

>>43964361 #

I broadly agree in that sure, unfettered access to copyrighted material will AI more capable, but more capable of what exactly?

For national security reasons I'm perfectly fine with giving LLMs unfettered access to various academic publications, scientific and technical information, that sort of thing. I'm a little more on the fence about proprietary code, but I have a hard time believing there isn't enough code out there already for LLMs to ingest.

Otherwise though, what is an LLM with unfettered access to copyrighted material better at vs one that merely has unfettered access to scientific / technical information + licensed copyrighted material? I would suppose that besides maybe being a more creative writer, the other LLM is far more capable of reproducing copyrighted works.

In effect, the other LLM is a more capable plagiarism machine compared to the other, and not necessarily more intelligent, and otherwise doesn't really add any more value. What do we have to gain from condoning it?

I think the argument I'm making is a little easier to see in the case of image and video models. The model that has unfettered access to copyrighted material is more capable, sure, but more capable of what? Capable of making images? Capable of reproducing Mario and Luigi in an infinite number of funny scenarios? What do we have to gain from that? What reason do we have for not banning such models outright? Not like we're really missing out on any critical security or economic advantages here.

replies(1): >>43965158 #

Teever ◴[12 May 25 17:01 UTC] No.43965158[source]▶

>>43964799 #

If common culture is an effective substrate to communicate ideas as in we can use shared pop culture references to make metaphors to explain complex ideas then the common culture that large companies have ensnared in excessively long copyrights and trademarks to generate massive profits is a useful thing for an LLM that is designed to convey ideas to have embedded in it.

If I'm learning about kinematics maybe it would be more effective to have comparisons to Superman flying faster than a speeding bullet and no amount of dry textbooks and academic papers will make up for the lack of such a comparison.

This is especially relevant when we're talking about science-fiction which has served as the inspiration for many of the leading edge technologies that we use including stuff like LLMs and AI.

replies(1): >>43966608 #

1. Bjorkbat ◴[12 May 25 19:21 UTC] No.43966608[source]▶

>>43965158 #

Fair point, we use metaphor to explain and understand a variety of topics, and a lot of those metaphors are best understood through pop culture analogies.

A reasonable compromise then is that you can train an AI on Wikipedia, more-or-less. An AI trained this way will have a robust understanding of Superman, enough that it can communicate through metaphor, but it won't have the training data necessary to create a ton of infringing content about Superman (well, it won't be able to create good infringing content anyway. It'll probably have access to a lot of plot summaries but nothing that would help it make a particularly interesting Superman comic or video).

To me it seems like encyclopedias use copyrighted pop culture in a way that constitutes fair use, and so training on them seems fine as long as they consent to it.

↑