Define policy forbidding use of AI code generators

(github.com)

494 points todsacerdoti | 5 comments | 25 Jun 25 23:26 UTC | HN request time: 0.917s | source

Show context

benlivengood ◴[26 Jun 25 00:21 UTC] No.44383064[source]▶

Open source and libre/free software are particularly vulnerable to a future where AI-generated code is ruled to be either infringing or public domain.

In the former case, disentangling AI-edits from human edits could tie a project up in legal proceedings for years and projects don't have any funding to fight a copyright suit. Specifically, code that is AI-generated and subsequently modified or incorporated in the rest of the code would raise the question of whether subsequent human edits were non-fair-use derivative works.

In the latter case the license restrictions no longer apply to portions of the codebase raising similar issues from derived code; a project that is only 98% OSS/FS licensed suddenly has much less leverage in takedowns to companies abusing the license terms; having to prove that infringers are definitely using the human-generated and licensed code.

Proprietary software is only mildly harmed in either case; it would require speculative copyright owners to disassemble their binaries and try to make the case that AI-generated code infringed without being able to see the codebase itself. And plenty of proprietary software has public domain code in it already.

replies(8): >>44383156 #>>44383218 #>>44383229 #>>44384184 #>>44385081 #>>44385229 #>>44386155 #>>44387156 #

1. strogonoff ◴[26 Jun 25 11:02 UTC] No.44386155[source]▶

>>44383064 #

People sometimes miss that copyleft is powered by copyright. Copyleft (which means Linux, Blender, and plenty of other goodness) needs the ability to impose some rules on what users do with your work, presumably in the interest of common good. Such ability implies IP ownership.

This does not mean that powerful interests abusing copyright with ever increasing terms and enforcement overreach is fair game. It harms common interest.

However, it does mean that abusing copyright from the other side and denouncing the core ideas of IP ownership—which is now sort of in the interest of certain companies (and capital heavily invested in certain fashionable but not yet profitable startups) based around IP expropriation—harms common interest just as well.

replies(1): >>44386212 #

2. ben_w ◴[26 Jun 25 11:14 UTC] No.44386212[source]▶

>>44386155 (TP) #

While this is a generally true statement (and has echoes in other areas like sovereign citizens), GenAI may make copyright (and copyleft) economically redundant.

While the AI we have now is not good enough to make an entire operating system when asked*, if/when they can, the benefits of all the current licensing models evaporate, and it doesn't matter if that model is proprietary with no source, or GPL, or MIT, because by that point anyone else can reproduce your OS for whatever the cost of tokens is without ever touching your code.

But as we're not there yet, I agree with @benlivengood that (most**) OSS projects must treat GenAI code as if it's unusable.

* At least, not a modern OS. I've not tried getting any model to output a tiny OS that would fit in a C64, and while I doubt they can currently do this, it is a bet I might lose, whereas I am confident all models would currently fail at e.g. reproducing Windows XP.

** I think MIT licensed projects can probably use GenAI code, they're not trying to require derivatives to follow the same licence, but I'm not a lawyer and this is just my barely informed opinion from reading the licenses.

replies(1): >>44387145 #

3. strogonoff ◴[26 Jun 25 13:18 UTC] No.44387145[source]▶

>>44386212 #

I have a few sociophilosophical quibbles about the impact of this, but to focus on a practical part:

> by that point anyone else can reproduce your OS for whatever the cost of tokens is without ever touching your code.

Do you think that the cost of tokens will remain low enough once these companies for now operating at loss have to be profitable, and it really is going to be “anyone else”? Or, would it be limited to “big tech” or select few corporations who can pay a non-trivial amount of money to them?

Do you think it would mean they essentially sell GPL’ed code for proprietary use? Would it not affect FOSS, which has been till now partially powered by the promise to contributors that their (often voluntary) work would remain for public benefit?

Do you think someone would create and make public (and gather so much contributor effort) something on the scale Linux, if they knew that it would be open to be scraped by an intermediary who can sell it at whatever price they choose to set to companies that then are free to call it their own and repackage commercially without contributing back, providing their source or crediting the original authors in any way?

replies(2): >>44388023 #>>44391120 #

4. Pet_Ant ◴[26 Jun 25 14:51 UTC] No.44388023{3}[source]▶

>>44387145 #

> Do you think that the cost of tokens will remain low enough once these companies for now operating at loss have to be profitable

New techniques are coming, new hardware processes are being developed, and the incremental unit cost is low. Once they fill up the labs, they'll start selling to consumers till the price becomes the cost of a bucket of sand and the cost to power a light-bulb.

5. ben_w ◴[26 Jun 25 20:41 UTC] No.44391120{3}[source]▶

>>44387145 #

> Do you think that the cost of tokens will remain low enough once these companies for now operating at loss have to be profitable, and it really is going to be “anyone else”? Or, would it be limited to “big tech” or select few corporations who can pay a non-trivial amount of money to them?

When considering current models, it's not in their power to prevent it:

DeepSeek demonstrated big models could be trained very easily for a modest budget, and inference is mostly constrained by memory access rather than compute, so if we had smartphones with a terabyte of RAM with a very high bandwidth to something like a current generation Apple NPU, things like DeepSeek R1 would run locally at (back-of-the-envelope calculation) about real-time — and drain the battery in half an hour if you used that model continuously.

But current models are not good enough, so the real question is: "who will hold what power when such models hypothetically are created?", and I have absolutely no idea.

> Do you think someone would create and make public (and gather so much contributor effort) something on the scale Linux, if they knew that it would be open to be scraped by an intermediary who can sell it at whatever price they choose to set to companies that then are free to call it their own and repackage commercially without contributing back, providing their source or crediting the original authors in any way?

Consider it differently: how much would it cost to use an LLM to reproduce all of Linux?

I previously rough-estimated that at $230/megatoken of (useful final product) output, an AI would be energy-competitive vs. humans consuming calories to live: https://news.ycombinator.com/item?id=44304186

As I don't have specifics, I need to Fermi-estimate this:

I'm not actually sure how big any OS (with or without apps) is, but I hear a lot of numbers in the range of 10-50 million. Let's say 50 Mloc.

I don't know the tokens per line, I'm going to guess 10.

50e6 lines * 10 tokens/line * $230/(1e6 tokens) = $115,000

There's no fundamental reason for $230/megatoken beyond that's when the AI is economically preferable to feeding a human who is doing it for free and you just need to stop them from starving to death, even if you have figured out how to directly metabolise electricity which is much cheaper than food: on the one hand $230, this is on the very expensive end of current models; on the second hand, see previous point about running DeepSeek R1 on phone processor with more RAM and bandwidth to match; on the third hand*, see other previous point that current models just aren't good enough to bother.

So it's current not available at any price, but when the quality is good, even charging a rate that's currently expensive makes all humans unemployable.

* Insert your own joke about about off-by-one-errors

↑