Define policy forbidding use of AI code generators

(github.com)

551 points todsacerdoti | 1 comments | 25 Jun 25 23:26 UTC | HN request time: 0.202s | source

Show context

benlivengood ◴[26 Jun 25 00:21 UTC] No.44383064[source]▶

Open source and libre/free software are particularly vulnerable to a future where AI-generated code is ruled to be either infringing or public domain.

In the former case, disentangling AI-edits from human edits could tie a project up in legal proceedings for years and projects don't have any funding to fight a copyright suit. Specifically, code that is AI-generated and subsequently modified or incorporated in the rest of the code would raise the question of whether subsequent human edits were non-fair-use derivative works.

In the latter case the license restrictions no longer apply to portions of the codebase raising similar issues from derived code; a project that is only 98% OSS/FS licensed suddenly has much less leverage in takedowns to companies abusing the license terms; having to prove that infringers are definitely using the human-generated and licensed code.

Proprietary software is only mildly harmed in either case; it would require speculative copyright owners to disassemble their binaries and try to make the case that AI-generated code infringed without being able to see the codebase itself. And plenty of proprietary software has public domain code in it already.

replies(10): >>44383156 #>>44383218 #>>44383229 #>>44384184 #>>44385081 #>>44385229 #>>44386155 #>>44387156 #>>44391757 #>>44392409 #

AJ007 ◴[26 Jun 25 00:53 UTC] No.44383229[source]▶

>>44383064 #

I understand what experienced developers don't want random AI contributions from no-knowledge "developers" contributing to a project. In any situation, if a human is review AI code line by line that would tie up humans for years, even ignoring anything legally.

#1 There will be no verifiable way to prove something was AI generated beyond early models.

#2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects. The only room for debate on that is an apocalypse level scenario where humans fail to continue producing semiconductors or electricity.

#3 If a project successfully excludes AI contributions (not clear how other than controlling contributions to a tight group of anti-AI fanatics), it's just going to be cloned, and the clones will leave it in the dust. If the license permits forking then it could be forked too, but cloning and purging any potential legal issues might be preferred.

There still is a path for open source projects. It will be different. There's going to be much, much more software in the future and it's not going to be all junk (although 99% might.)

replies(17): >>44383277 #>>44383278 #>>44383309 #>>44383367 #>>44383381 #>>44383421 #>>44383553 #>>44383615 #>>44383810 #>>44384306 #>>44384448 #>>44384472 #>>44385173 #>>44386408 #>>44387925 #>>44389059 #>>44397514 #

basilgohar ◴[26 Jun 25 01:55 UTC] No.44383553[source]▶

>>44383229 #

I feel like this is mostly proofless assertion. I'm aware what you hint at is happening, but the conclusions you arrive at are far from proven or even reasonable at this stage.

For what it's worth, I think AI for code will arrive at a place like how other coding tools sit – hinting, intellisense, linting, maybe even static or dynamic analysis, but I doubt NOT using AI will be a critical asset to productivity.

Someone else in the thread already mentioned it's a bit of an amplifier. If you're good, it can make you better, but if you're bad it just spreads your poor skills like a robot vacuum spreads animal waste.

replies(2): >>44383595 #>>44384544 #

otabdeveloper4 ◴[26 Jun 25 05:53 UTC] No.44384544[source]▶

>>44383553 #

IMO LLMs are best when used as locally-run offline search engines. This is a clear and obvious disruptive technology.

But we will need to get a lot better at finetuning first. People don't want generalist LLMs, they want "expert systems".

replies(1): >>44385353 #

danielbln ◴[26 Jun 25 08:25 UTC] No.44385353[source]▶

>>44384544 #

Speak for yourself, I prefer generalist LLMs. Also, the bitter lesson of ML applies.

replies(1): >>44395713 #

otabdeveloper4 ◴[27 Jun 25 10:54 UTC] No.44395713[source]▶

>>44385353 #

> I prefer generalist LLMs.

No you don't. You personally probably don't need a Pokemon encyclopedia or Bengali spell checking in your daily LLM usage.

These are the kinds of things you're paying for when you're using the newer models with huge parameter counts.

replies(1): >>44396107 #

danielbln ◴[27 Jun 25 12:03 UTC] No.44396107[source]▶

>>44395713 #

while I probably won't need pokemon knowledge from an llm, I still think broad pre-training enables unpredictable cross-domain knowledge transfer - seemingly irrelevant data can contribute to reasoning abilities that improve specialized tasks. and most developers work across domains anyway, so better to start general and post-train than lose emergent capabilities from narrow foundations.

even within 'coding' there's huge breadth. you might need to understand database schemas, api documentation, domain-specific algorithms, legacy system constraints, or business logic from completley different industries. a narrowly-tuned model might excel at leetcode problems but struggle when you need to build a financial trading system or whatever.

replies(1): >>44404595 #

1. otabdeveloper4 ◴[28 Jun 25 13:31 UTC] No.44404595[source]▶

>>44396107 #

We need 1000 different specialized LLMs, not one huge LLM with 1000 times more parameters.

Whoever figures this out from a technical point of view will probably win this AI bubble iteration.

Also, the big money is in local inference, not cloud chat. The pendulum will swing back towards personal computing. Whoever figures out the solution to this technical problem will also win.

(I have no use whatsoever for C# or Java knowledge. I would pay for LLM weights that can figure a complex Nix codebase, preferably my own. I want to buy the weights though, cloud is a complete non-starter here.)

↑