Define policy forbidding use of AI code generators

1. wyldfire ◴[25 Jun 25 23:52 UTC] No.44382903[source]▶

I understand where this comes from but I think it's a mistake. I agree it would be nice if there were "well settled law" regarding AI and copyright, probably relatively few rulings and next to zero legislation on which to base their feelings.

In addition to a policy to reject contributions from AI, I think it may make sense to point out places where AI generated content can be used. For example - how much of QEMU project's (copious) CI setup is really stuff that is critical content to protect? What about ever-more interesting test cases or environments that could be enabled? Something like "contribute those things here instead, and make judicious use of AI there, with these kinds of guard rails..."

replies(5): >>44382957 #>>44382958 #>>44383166 #>>44383312 #>>44383370 #

2. dclowd9901 ◴[26 Jun 25 00:00 UTC] No.44382957[source]▶

>>44382903 (TP) #

What's the risk of not doing this? Better code but slower velocity for an open source project?

I think that particular brand of risk makes sense for this particular project, and the authors don't seem particularly negative toward GenAI as a concept, just going through a "one way door" with it.

replies(1): >>44384090 #

3. kazinator ◴[26 Jun 25 00:00 UTC] No.44382958[source]▶

>>44382903 (TP) #

There is a well settled practice in computing that you just don't plagiarize code. Even a small snippet. Even if copyright law would consider such a small thing "fair use".

replies(2): >>44383103 #>>44383321 #

4. 9283409232 ◴[26 Jun 25 00:27 UTC] No.44383103[source]▶

>>44382958 #

This isn't 100% true meaning it isn't well settled. Have people already forgotten Google vs Oracle? Google ended up winning that after years and years but the judgements went back and forth and there are around 4 or 5 guidelines to determine whether something is or isn't fair use and generative AI would fail at a few of those.

replies(2): >>44383213 #>>44383466 #

5. pavon ◴[26 Jun 25 00:41 UTC] No.44383166[source]▶

>>44382903 (TP) #

This isn't like some other legal questions that go decades before being answered in court. There are dozens of cases working through the courts today that will shed light on some aspects of the copyright questions within a few years. QEMU has made great progress over the last 22 years without the aid of AI, waiting a few more years isn't going to hurt them.

6. ◴[26 Jun 25 00:49 UTC] No.44383213{3}[source]▶

>>44383103 #

7. dijksterhuis ◴[26 Jun 25 01:07 UTC] No.44383312[source]▶

>>44382903 (TP) #

It's a simpler solution is just to wait until legal situation is clearer.

QEMU is (mostly) GPL 2.0 licensed, meaning (most) code contributions need to be GPL 2.0 compatible [0]. Let's say, hypothetically, there's a code contribution added by some patch involving gen AI code which is derived/memorised/copied from non-GPL compatible code [1]. Then, hypothetically, a legal case sets precedent that gen AI FOSS code must re-apply the license of the original derived/memorised/copied code. QEMU maintainers would probably need to roll back all those incompatible code contributions. After some time, those code contributions could have ended up with downstream callers which also need to be rewritten (even in CI code).

It might be possible to first say "only CI code which is clearly labelled as 'DO NOT RE-USE: AI' or some such". But the maintainers would still need to go through and rewrite those parts of the CI code if this hypothetical plays out. Plus it adds extra work to reviews and merge processes etc.

it's just less work and less drama for everyone involved to say "no thank you (for now)".

----

caveat: IANAL, and licensing is not my specific expertise (but i would quite like it to be one day)

[0]: https://github.com/qemu/qemu/blob/master/LICENSE

[1]: e.g. No license / MPL / Apache / Aritistic / Creative Commons https://www.gnu.org/licenses/license-list.html#NonFreeSoftwa...

8. bfLives ◴[26 Jun 25 01:08 UTC] No.44383321[source]▶

>>44382958 #

> There is a well settled practice in computing that you just don't plagiarize code. Even a small snippet.

I think way many developers use StackOverflow suggests otherwise.

replies(1): >>44383415 #

9. hinterlands ◴[26 Jun 25 01:19 UTC] No.44383370[source]▶

>>44382903 (TP) #

I think you need to read between the lines here. Anything you do is a legal risk, but this particular risk seems acceptable to many of the world's largest and richest companies. QEMU isn't special, so if they're taking this position, it's most likely simply because they don't want to deal with LLM-generated code for some other reason, are eager to use legal risk as a cover to avoid endless arguments on mailing lists.

We do that in corporate environments too. "I don't like this" -> "let me see what lawyers say" -> "a-ha, you can't do it because legal says it's a risk".

10. kazinator ◴[26 Jun 25 01:28 UTC] No.44383415{3}[source]▶

>>44383321 #

In the first place, in order to post to StackOverflow, you are required to have the copyright over the code, and be able to grant them a perpetual license.

They redistribute the material under the CC BY-SA 4.0 license. https://creativecommons.org/licenses/by-sa/4.0/

This allows visitors to use the material, with attribution. One can, of course, use the ideas in a SO answer to develop one's own solution.

replies(2): >>44384260 #>>44385322 #

11. kazinator ◴[26 Jun 25 01:37 UTC] No.44383466{3}[source]▶

>>44383103 #

Google vs. Oracle was about whether APIs are copyrightable, which is an important issue that speaks to antitrust. Oracle wanted the interface itself to be copyrighted so that even if someone reproduced the API from a description of it, it would infringe. The implication being that components which clone an API would be infringing, even though their implementation is original, discouraging competitors from making API-compatible components.

My comment didn't say anything about the output of AI being fair use or not, rather that fair use (no matter where you are getting material from) ipso facto doesn't mean that copy paste is considered okay.

Every employer I ever had discouraged copy and paste from anywhere as a blanket rule.

At least, that had been the norm, before the LLM takeover. Obviously, organizations that use AI now for writing code are plagiarizing left and right.

replies(1): >>44383726 #

12. overfeed ◴[26 Jun 25 02:30 UTC] No.44383726{4}[source]▶

>>44383466 #

> Google vs. Oracle was about whether APIs are copyrightable, which is an important issue that speaks to antitrust.

In addition to the Structure, Sequence and Organization claims, the original filing included a claim for copyright violation on 9 identical lines of code in rangeCheck(). This claim was dropped after the judge asked Oracle to reduce the number of claims, which forced Oracle to pare down to their strongest claims.

13. mrheosuper ◴[26 Jun 25 04:04 UTC] No.44384090[source]▶

>>44382957 #

>Better code but slower velocity for an open source project

Better code and "AI assist coding" are not exclusive of each other.

14. behringer ◴[26 Jun 25 04:46 UTC] No.44384260{4}[source]▶

>>44383415 #

Show me the professional code base with the attribution to stack overflow and I'll eat my hat.

replies(1): >>44385220 #

15. _flux ◴[26 Jun 25 08:01 UTC] No.44385220{5}[source]▶

>>44384260 #

Obviously I cannot show the code base, but when I pick a pre-existing solution from Stackoverflow or elsewhere—though it is quite rare—I do add a comment linking to the source: after all, in case of SA the discussion there might be interesting for the future maintainers of the function.

I just checked, though, and the code base I'm now working with has eight stackoverflow links. Not all are even written by me, according to quick check with git blame and git log -S..

replies(1): >>44385242 #

16. graemep ◴[26 Jun 25 08:04 UTC] No.44385242{6}[source]▶

>>44385220 #

I always do to, for exactly the same reason.

17. graemep ◴[26 Jun 25 08:20 UTC] No.44385322{4}[source]▶

>>44383415 #

> you are required to have the copyright over the code, and be able to grant them a perpetual license.

Which Stack Overflow cannot verify. It might be pulled from a code base, or generated by AI (I would bet a lot is now).