←back to thread

Using LLMs at Oxide

(rfd.shared.oxide.computer)
694 points steveklabnik | 5 comments | | HN request time: 0.001s | source
Show context
csb6 ◴[] No.46179547[source]
Strange to see no mention of potential copyright violations found in LLM-generated code (e.g. LLMs reproducing code from Github verbatim without respecting the license). I would think that would be a pretty important consideration for any software development company, especially one that produces so much free software.
replies(4): >>46179678 #>>46179797 #>>46179941 #>>46188231 #
fastball ◴[] No.46179941[source]
Has anything like this worked its way through the courts yet?
replies(1): >>46180382 #
adastra22 ◴[] No.46180382[source]
Yes, training is considered fair use, and output is non-copyrightable / public domain. With many asterix and footnotes, of course.
replies(1): >>46180409 #
1. Madmallard ◴[] No.46180409[source]
Don't see how output being public domain makes sense when they could be outputting copyrighted code.

Shouldn't the right's extend forward and simply require the LLM code to be deleted?

replies(2): >>46180495 #>>46180516 #
2. menaerus ◴[] No.46180495[source]
First, you have to prove it that it produced the copyrighted code. The question is what copyrighted code is in the first place? Literal copy-paste from source is easy but I think 99% of the time this isn't the case.
3. adastra22 ◴[] No.46180516[source]
With many asterix and footnotes. One of which being that if it literally output the exact code, of course that would be copyright infringement. Something that greatly resembled but with minor changes would be a gray area.

Those kinds of cases, although they do happen, are exceptional. In a typical output that doesn't not line-for-line resemble a single training input, it is considered a new, but non-copyrightable work.

replies(1): >>46183504 #
4. vegardx ◴[] No.46183504[source]
(I'm not a lawyer)

You should be careful about speaking in absolute terms when talking about copyright.

There is nothing that prevents multiple people from owning copyright to identical works. This is also why copyright infringement is such a mess to litigate.

I'd also be interested in knowing why you think code generated by LLMs can't be copyrighted. That's quite a statement.

There's also the problem with copyright law and different jurisdictions.

replies(1): >>46188198 #
5. adastra22 ◴[] No.46188198{3}[source]
It is the official stance of the US copyright office.

It was upheld by Thaler v. Perlmutter.

Bartz v. Anthropic and Kadrey v. Meta confirmed with similar rulings.