Universities have this issue too, despite many offering students and staff Grammarly (Gen AI) while also trying to ban Gen AI.
Universities have this issue too, despite many offering students and staff Grammarly (Gen AI) while also trying to ban Gen AI.
I'm sure that if a contributor working on a feature used cursor to initially generate the code but then goes over it to ensure it's working as expected that would be allowed, this is more for those folks that just want to jam in a quick vibe-coded PR so they can add "contributed to the QEMU project" on their resumes.
Use AI if you want to, but if the person on the other side can tell, and you can't defend the submission as your own, that's a problem.
The actual policy is "don't use AI code generators"; don't try to weasel that into "use it if you want to, but if the person on the other side can tell". That's effectively "it's only cheating if you get caught".
By way of analogy, Open Source projects also typically have policies (whether written or unwritten) that you only submit code you are legally allowed to submit. In theory, you could take a pile of proprietary reverse-engineered code that you have no license to, or a pile of code from another project that you aren't respecting the license of, and submit it anyway, and slap a `Signed-off-by` on it. Nothing will physically stop you, and people might not be able to tell. That doesn't make it OK.
1. My C# code compiled just fine and ran even, but it was convinced that I was missing a closing brace on a lambda near where the exception was occurring. The diff was ... Putting the existing brace on a new line. Confidently stated that was the problem and declared it fixed.
2. It did figure out that an unexpected type was being seen, and implemented a pathway that allowed for it to get to the next error, but didn't look into why that type had gotten there; that was the actual bug, not the unhandled type. So it "fixed" it, but just kicked the can down the road.
3. When figuring out the issue, it just looked at the stack trace. That was it. It was running the compiler itself; it could've just embedded some debug code (like I did) and work out what the actual issue was, but it didn't even try. The exception was just a NotSupportedException with no extra details to work off of, so adding just a crumb of context would let you solve the issue.
Now, is this the simplest emulator you could throw AI at? No, not at all. But neither is qemu. I'm thoroughly unconvinced that current tools could provide real value on codebases like these. I'm bullish on them for the future, and I use GenAI constantly, but this ain't a viable use case today.
In addition to a policy to reject contributions from AI, I think it may make sense to point out places where AI generated content can be used. For example - how much of QEMU project's (copious) CI setup is really stuff that is critical content to protect? What about ever-more interesting test cases or environments that could be enabled? Something like "contribute those things here instead, and make judicious use of AI there, with these kinds of guard rails..."
The rules regarding the origin of code contributions are rather strict, that is, you can't contribute other people code unless you can make sure that the licence is appropriate. A LLM may output a copy of someone else code, sometimes verbatim, without giving you its origin, so you can't contribute code written by a LLM.
I think that particular brand of risk makes sense for this particular project, and the authors don't seem particularly negative toward GenAI as a concept, just going through a "one way door" with it.
I'm very old man shouting at clouds about this stuff. I don't want to review code the author doesn't understand and I don't want to merge code neither of us understand.
Getting AI to remind you of the libraries API is a fair bit different to having it generate 1000 lines of code you have hardly read before submitting.
If the problem is too many submissions, that would suggest there needs to be structures in place to manage that.
Perhaps projects receiving lage quanties of updates need triage teams. I suspect most of the submissions are done in good faith.
I can see some people choosing to avoid AI due to the possibility of legal issues. I'm doubtful of the likelihood of such problems, but some people favour eliminating all possibly over minimizing likelihood. The philosopher in me feels like people who think they have eliminated the possibility of something just haven't thought about it enough.
I use the term algorithmic because I think it is stronger than "AI lol". I note they use terms like AI code generator in the policy, which might be just as strong but looks to me as unlikely to becoming a useful legal term (its hardly "a man on the Clapham omnibus").
They finish with this, rather reasonable flourish:
"The policy we set now must be for today, and be open to revision. It's best to start strict and safe, then relax."
No doubt they do get a load of slop but they seem to want to close the legal angles down first and attribution seems a fair place to start off. This play book looks way better than curl's.
I suspect their concern is not so much whether users have own the copyright to AI output but rather the risk that AI will spit out code from its training set that belongs to another project.
Most hypervisors are closed source and some are developed by litigious companies.
This really bothers me. I've had people ask me to do some task except they get AI to provide instructions on how to do the task and send me the instructions, rather than saying "Hey can you please do X". It's insulting.
Why is it archaic if it works? I get there might be other ways to do patch sharing and discussion but what exactly is your problem with email as a transport?
You might as well describe voice and ears as archaic!
Pull your pants up.
In the former case, disentangling AI-edits from human edits could tie a project up in legal proceedings for years and projects don't have any funding to fight a copyright suit. Specifically, code that is AI-generated and subsequently modified or incorporated in the rest of the code would raise the question of whether subsequent human edits were non-fair-use derivative works.
In the latter case the license restrictions no longer apply to portions of the codebase raising similar issues from derived code; a project that is only 98% OSS/FS licensed suddenly has much less leverage in takedowns to companies abusing the license terms; having to prove that infringers are definitely using the human-generated and licensed code.
Proprietary software is only mildly harmed in either case; it would require speculative copyright owners to disassemble their binaries and try to make the case that AI-generated code infringed without being able to see the codebase itself. And plenty of proprietary software has public domain code in it already.
I see those glasses as becoming just a part of me, just like my current dumb glasses are a part of me that enables me to see better, the smart glasses will help me to see AND think better.
My brain was trained on a lot of proprietary code as well, the copyright issues around AI models are pointless western NIMBY thinking and will lead to the downfall of western civilization if they keep pursuing legal what-ifs as an excuse to reject awesome technology.
I'm not sure that's the dunk you think it is. Good for Netflix for making money, but we're drowning in their empty slop content now and worse off for it.
This is the same people that think that "learning to code" is a translation issue they don't have time for as opposed to experience they don't have.
This ignores the fact that many open source projects do not have the resources to dedicate to a large number of contributions. A side effect of LLM generated code is probably going to be a lot of code. I think this is going to be an issue that is not dependent on the overall quality of the code.
With AI you're going to get job hunters automating PRs for big name projects so they can stick the contributions in their resume.
---
LLM-Generated Contribution Policy
Color is a library full of complex math and subtle decisions (some of them possibly even wrong). It is extremely important that any issues or pull requests be well understood by the submitter and that, especially for pull requests, the developer can attest to the Developer Certificate of Origin for each pull request (see LICENCE).
If LLM assistance is used in writing pull requests, this must be documented in the commit message and pull request. If there is evidence of LLM assistance without such declaration, the pull request will be declined.
Any contribution (bug, feature request, or pull request) that uses unreviewed LLM output will be rejected.
---
I am also adding this to my `SECURITY.md` entries:
---
LLM-Generated Security Report Policy
Absolutely no security reports will be accepted that have been generated by LLM agents.
---
As it's mostly just me, I'm trying to strike a balance, but my preference is against LLM generated contributions.
*"Automated" as in bots and "AI submissions" as in ai-generated code
Basically I think open source has traditionally HEAVILY relied on hidden competency markers to judge the quality of incoming contributions. LLMs throw that entire concept on its head by presenting code that has competent markers but none of the backing experience. It is a very very jarring experience for experienced individuals.
I suspect that virtual or in person meetings and other forms of social proof independent of the actual PR will become far more crucial for making inroads in large projects in the future.
https://news.artnet.com/art-world/ai-art-us-copyright-office...
https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
Im pretty sure that this ship has sailed.
#1 There will be no verifiable way to prove something was AI generated beyond early models.
#2 Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects. The only room for debate on that is an apocalypse level scenario where humans fail to continue producing semiconductors or electricity.
#3 If a project successfully excludes AI contributions (not clear how other than controlling contributions to a tight group of anti-AI fanatics), it's just going to be cloned, and the clones will leave it in the dust. If the license permits forking then it could be forked too, but cloning and purging any potential legal issues might be preferred.
There still is a path for open source projects. It will be different. There's going to be much, much more software in the future and it's not going to be all junk (although 99% might.)
My high-level work is absolutely impossible to delegate to AI, but AI really helps with tedious or low-stakes incidental tasks. The other day I asked Claude Code to wire up some graphs and outlier analysis for some database benchmark result CSVs. Something conceptually easy, but takes a fair bit of time to figure out libraries and get everything hooked up unless you're already an expert at csv processing.
This is very, very germane and a very quotable line. And these people have been around from long before LLMs appeared. These are the people who dash off an incomplete idea on Friday afternoon and expect to see a finished product in production by next Tuesday, latest. They have no self-awareness of how much context and disambiguation is needed to go from "idea in my head" to working, deterministic software that drives something like a process change in a business.
I know several people like this, and it seems they feel like they have god powers now - and that they alone can communicate with "the AI" in this way that is simply unreachable by the rest of the peasants.
"competitive", meaning: "most features/lines of code emitted" might matter to a PHB or Microsoft
but has never mattered to open source
QEMU is (mostly) GPL 2.0 licensed, meaning (most) code contributions need to be GPL 2.0 compatible [0]. Let's say, hypothetically, there's a code contribution added by some patch involving gen AI code which is derived/memorised/copied from non-GPL compatible code [1]. Then, hypothetically, a legal case sets precedent that gen AI FOSS code must re-apply the license of the original derived/memorised/copied code. QEMU maintainers would probably need to roll back all those incompatible code contributions. After some time, those code contributions could have ended up with downstream callers which also need to be rewritten (even in CI code).
It might be possible to first say "only CI code which is clearly labelled as 'DO NOT RE-USE: AI' or some such". But the maintainers would still need to go through and rewrite those parts of the CI code if this hypothetical plays out. Plus it adds extra work to reviews and merge processes etc.
it's just less work and less drama for everyone involved to say "no thank you (for now)".
----
caveat: IANAL, and licensing is not my specific expertise (but i would quite like it to be one day)
[0]: https://github.com/qemu/qemu/blob/master/LICENSE
[1]: e.g. No license / MPL / Apache / Aritistic / Creative Commons https://www.gnu.org/licenses/license-list.html#NonFreeSoftwa...
But I refuse to use it as anything more than a fancy autocomplete. If it suggests code that's pretty close to what I was about to type anyway, I accept it.
This ensures that I still understand my code, that there shouldn't be any hallucination derived bugs, [1] and there really shouldn't be any questions about copyright if I was about to type it.
I find using copilot this way speeds me up. Not really because my typing is slow, it's more that I have a habit of getting bored and distracted while typing. Copilot helps me get to the next thinking/debugging part sooner.
My brain really comprehend the idea that anyone would not want to not understand their code. Especially if they are going to submit it as a PR.
And I'm a little annoyed that the existence of such people is resulting in policies that will stop me from using LLMs as autocomplete when submitting to open source projects.
I have tried using copilot in other ways. I'd love for it to be able to do menial refactoring tasks for me. But every-time I experiment, it seems to fall off the rails so fast. Or it just ends up slower than what I could do manually because it has to re-generate all my code instead of just editing it.
[1] Though I find it really interesting that if I'm in the middle of typing a bug, copilot is very happy to autocomplete it in its buggy form. Even when the bug is obvious from local context, like I've typoed a variable name.
It's only going to get more pervasive from now on.
We do that in corporate environments too. "I don't like this" -> "let me see what lawyers say" -> "a-ha, you can't do it because legal says it's a risk".
I am personally somewhere in the middle, just good enough to know I am really bad at this so I make sure that I don't contribute to anything that is actually important ( like QEMU ).
But how many people recognize their own strengths and weaknesses? That is part of the problem and now we are proposing that even that modicum of self-regulation ( as flawed as it is ) be removed.
FWIW, I hear you. I also don't have an answer. Just thinking out loud.
It's like complaining that I may have no legal right to submit my stick figure because I potentially copied it from the drawing of another stick figure.
I'm firmly convinced that these policies are only written to have plausible deniability when stuff with generated code gets inevitably submitted anyway. There's no way the people that write these things aren't aware they're completely unenforceable.
They redistribute the material under the CC BY-SA 4.0 license. https://creativecommons.org/licenses/by-sa/4.0/
This allows visitors to use the material, with attribution. One can, of course, use the ideas in a SO answer to develop one's own solution.
Yeah I don’t think so. But if it does then who cares? AI can just make a better QEMU at that point I guess.
They aren’t hurting anyone with this stance (except the AI hype lords), which I’m pretty sure isn’t actually an anti-AI stance, but a pragmatic response to AI slop in its current state.
In free software though, these kinds of nonsense suggestions always happened, way before AI. Just look at any project mailing list.
It is expected that any new suggestion will encounter some resistance, the new contributor itself should be aware of that. For serious projects specifically, the levels of skepticism are usually way higher than corporations, and that's healthy and desirable.
My comment didn't say anything about the output of AI being fair use or not, rather that fair use (no matter where you are getting material from) ipso facto doesn't mean that copy paste is considered okay.
Every employer I ever had discouraged copy and paste from anywhere as a blanket rule.
At least, that had been the norm, before the LLM takeover. Obviously, organizations that use AI now for writing code are plagiarizing left and right.
Of course it is. And nobody said otherwise, because that is explicitly stated on the commit message:
[...] More broadly there is,
as yet, no broad consensus on the licensing implications of code
generators trained on inputs under a wide variety of licenses
And in the patch itself: [...] With AI
content generators, the copyright and license status of the output is
ill-defined with no generally accepted, settled legal foundation.
What other commenters pointed out is that, beyond the legal issue, other problems also arise form the use of AI-generated code.For what it's worth, I think AI for code will arrive at a place like how other coding tools sit – hinting, intellisense, linting, maybe even static or dynamic analysis, but I doubt NOT using AI will be a critical asset to productivity.
Someone else in the thread already mentioned it's a bit of an amplifier. If you're good, it can make you better, but if you're bad it just spreads your poor skills like a robot vacuum spreads animal waste.
Because for projects like QEMU, current AI models can actually do mind-boggling stuff. You can give it a PDF describing an instruction set, and it will generate you wrapper classes for emulating particular instructions. Then you can give it one class like this and a few paragraphs from the datasheet, and it will spit out unit tests checking that your class works as the CPU vendor describes.
Like, you can get from 0% to 100% test coverage several orders of magnitude faster than doing it by hand. Or refactoring, where you want to add support for a particular memory virtualization trick, and you need to update 100 instruction classes based on straight-forward, but not 100% formal rule. A human developer would be pulling their hairs out, while an LLM will do it faster than you can get a coffee.
The code example was AI generated. I couldn't find a single line of code anywhere in any codebase. 0 examples on GitHub.
And of course it didn't work.
But, it sent me on a wild goose because I trusted this person to give me a valuable insight. It pisses me off so much.
It might actually be prudent for some (perhaps many foundational) OSS projects to reject AI until the full legal case law precedent has been established. If they begin taking contributions and we find out later that courts find this is in violation of some third party's copyright (as shocking as that outcome may seem), that puts these projects in jeopardy. And they certainly do not have the funding or bandwidth to avoid litigation. Or to handle a complete rollback to pre-AI background states.
AI further encourages the problem in DevOps/Systems Engineering/SRE where someone comes to you and says "hey can you do this for me" having come up with the solution instead of giving you the problem "hey can you help me accomplish this"... AI gives them solutions which is more steps away to detangle into what really needs to be done.
AI has knowledge, but it doesn't have taste. Especially when it doesn't have all of the context a person with experience, it just has bad taste in solutions or just the absence of taste but with the additional problem that it makes it much easier for people to do things.
Permissions on what people have access to read and permission to change is now going to have to be more restricted because not only are we dealing with folks who have limited experience with permissions, now we have them empowered by AI to do more things which are less advisable.
Which is entirely reasonable. The trend of people say, on HN saying "I asked an LLM and this is what it said..." is infuriating.
It's just an upfront declaration that if your answer to something is "it's what Claude thinks" then it's not getting merged.
The thinking here is probably similar: if AI-generated code becomes poisonous and is detected in a project, the DCO could allow shedding liability onto the contributor that said it wasn’t AI-generated.
Seeing this new phenomenon must be difficult for those people who have spent a long time perfecting their craft. Essentially, they might feel that their skillsets are being undermined. It would be especially hard for people who associate a lot of their self-identity with their job.
Being a purist is noble, but I think that this stance is foolish. Essentially, people who chose not to use AI code tools will be overtaken by the people who do. That's the unfortunate reality.
Don’t be ridiculous. The majority of people are in fact honest, and won’t submit such code; the major effect of the policy is to prevent those contributions.
Then you get plausible deniability for code submitted by villains, sure, but I’d like to hope that’s rare.
The others are just too specific for me to be useful for anyone else: an android app for automatic processing of some text messages and a work scheduling/prioritising thing. The time to make them generic enough to share would be much longer than creating my specific version in the first place.
A far too common trap people fall into is the fallacy of "your job is easy as all you have to do is <insert trivialization here>, but my job is hard because ..."
Statistically generated text (token) responses constructed by LLM's to simplistic queries are an accelerant to the self-aggrandizing problem.
So it goes back for changes. It returns the next day with complete rewrites of large chunks. More "lgtm" from others. More incredibly obvious flaws, race conditions, the works.
And then round three repeats mistakes that came up in round one, because LLMs don't learn.
This is not a future style of work that I look forward to participating in.
In addition to the Structure, Sequence and Organization claims, the original filing included a claim for copyright violation on 9 identical lines of code in rangeCheck(). This claim was dropped after the judge asked Oracle to reduce the number of claims, which forced Oracle to pare down to their strongest claims.
I would wager good money that in a few years the most security-focused companies will be relying heavily on AI somewhere in their software supply chain.
So I don't think this policy is about security posture. No doubt human experts are reviewing the security-relevant patches anyway.
For example, if using Copilot, Microsoft also has every commit ever made if the project is on GitHub.
They could, theoretically, determine what did or didn't come out of their models and was integrated into source trees.
Regarding #2 and #3, with relatively novel software like QEMU that models platforms that other open source software doesn't, LLMs might not be a good fit for contributions. Especially where emulation and hardware accuracy, timing, quirks, errata etc matter.
For example, modeling a new architecture or emulating new hardware might have LLMs generating convincing looking nonsense. Similarly, integrating them with newly added and changing APIs like in kvm might be a poor choice for LLM use.
Sometimes it's fun reverse engineering the directions back into various forum, Stack Overflow, and documentation fragments and pointing out how AI assembled similar things into something incorrect
Makes total sense.
I am just wondering how do we differentiate between AI generated code and human written code that is influenced or copied from some unknown source. The same licensing problem may happen with human code as well especially for OSS where anyone can contribute.
Given the current usage, I am not sure if AI generated code has an identity of its own. It’s really a tool in the hand of a human.
You can’t dismiss it out of hand (especially with it coming from up the chain) but it takes no time at all to generate by someone who knows nothing about the problem space (or worse, just enough to be dangerous) and it could take hours or more to debunk/disprove the suggestion.
I don’t know what to call this? Cognitive DDOS? Amplified Plausibility Attack? There should be a name for it and it should be ridiculed.
I would find it very insulting if someone did this to me, for sure, as well as a huge waste of my time.
On the other hand I've also worked with some very intransigent developers who've actively fought against things they simply didn't want to do on flimsy technical grounds, knowing it couldn't be properly challenged by the requester.
On yet another hand, I've also been subordinate to people with a small amount of technical knowledge -- or a small amount of knowledge about a specific problem -- who'll do the exact same thing without ChatGPT: fire a bunch of mid-wit ideas downstream that you have already thought about, but you then need to spend a bunch of time explaining why their hot-takes aren't good. Or the CEO of a small digital agency I worked at circa 2004 asking us if we'd ever considered using CSS for our projects (which were of course CSS heavy).
There are simple algorithms that everyone will implement the same way down to the variable names, but aside from those fairly rare exceptions, there's no "maximum number of lines" metric to describe how much code is "fair use" regardless of the licence of the code "fair use"d in your scenario.
Depending on the context, even in the US that 5-second clip would not pass fair use doctrine muster. If I made a new film cut entirely from five second clips of different movies and tried a fair use doctrine defence, I would likely never see the outside of a courtroom for the rest of my life. If I tried to do so with licensing, I would probably pay more than it cost to make all those movies.
Look up the decisions over the last two decades over sampling (there are albums from the late 80s and 90s — when sampling was relatively new — which will never see another pressing or release because of these decisions). The musicians and producers who chose the samples thought they would be covered by fair use.
I don't think anyone is claiming that. If you submit changes to a FOSS project and an LLM assisted you in writing them how would anyone know? Assuming at least that you are an otherwise competent developer and that you carefully review all code before you commit it.
The (admittedly still controversial) claim being made is that developers with LLM assistance are more productive than those without. Further, that there is little incentive for such developers to advertise this assistance. Less trouble for all involved to represent it as 100% your own unassisted work.
It look like LLM is not good for cooperation, because the nature of LLM is randomness.
I'm getting towards the end of a vibe coded ZFS storage backend to ganeti that includes the ability to live migrate VMs to another host by: taking snapshot and replicating it to target, pausing VM, taking another incremental snapshot and replicating it, and then unpausing the VM on the new destination machine. https://github.com/linsomniac/ganeti/tree/newzfs
Other LLM tools I've built this week:
This afternoon I built a web-based SQL query editor/runner with results display, for dev/ops people to run read-only queries against our production database. To replace an existing super simple one, and add query syntax highlighting, snippet library, and other modern features. I can probably release this though I'd need to verify that it won't leak anything. Targets SQL Server.
A couple CLI Jira tools to pull a list of tickets I'm working on (with cache so I can get an immediate response, then get updates after Jira response comes back), and tickets with tags that indicate I have to handle them specially.
An icinga CLI that downtimes hosts, for when we do sweeping machine maintenances like rebooting a VM host with dozens of monitored children.
An Ansible module that is a "swiss army knife" for filesystem manipulation, merging the functions of copy, template, file, so you can loop over a list and: create a directory, template a couple files into it, doing a notify on one and a when on another, ensure a file exists if it doesn't already, to reduce duplication of boilerplate when doing a bunch of file deploys. This I will release as a ansible galaxy module once I have it tested a little more.
Remember, anyone can attempt to sue anyone for anything at any time in a functional system. How far the suit makes it is a different matter.
ai moves faster than group consensus this ban won't slow down the tech it'll may make paradigms like qemu harder to enter harder to scale, harder to test thru properly
so if we maintain code like this we gotta know the trade we're making we're preserving trust but limiting throughput maybe fine idk but don't confuse it as future proofing
i kinda feel it does exposes trust in oss is social not epistemic. we accept complex things if we know who dropped it and we reject clean things if it smells synthetic
so the real qn isn't > did we use ai? it's > can we even maintain this in 6mo? and if the answer's yes doesn't really matter who produced the code fr
Better code and "AI assist coding" are not exclusive of each other.
It’s a power saw. A really powerful tool that can be dangerous if used improperly. In that sense the code generator can have more or less of a mind of its own depending on the wielder.
Ok I think I’ve stretched the analogy to the breaking point…
Really, really good tools.
So-called AI makes this worse.
Let me remind you of gyms, now that humans have been saved of much manual activity...
The barrier to being able to do a first commit on any project is usually quite high, there are plenty of people who would like to contribute to projects but cannnot dedicate the time n effort to pass that initial threshold. This might allow people an ability to contribute at a lower level while gently introducing them to the codebase where perhaps they might become a regular contributer in the future.
I get that. But the AI tooling when guided by a competent human can generate some pretty competent code, a lot of it can be driven entirely through natural language instructions. And every few months, the tooling is getting significantly more capable.
I'm contemplating what exactly it means to "understand" the code though. In the case of one project I'm working on, it's an (almost) entirely vibe-coded new storage backend to an existing VM orchestration system. I don't know the existing code base. I don't really have the time to have implemented it by hand (or I would have done it a couple years ago).
But, I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.
As an open source maintainer myself, I can imagine (thankfully I haven't been hit with it myself) how frustrating getting all sorts of low quality LLM "slop" submissions could be. I also understand that I'm going to have to review the code coming in whether or not the author of the submission understands it.
So how, as developers, do we leverage these tools as appropriate, and signal to other developers the level of quality in code. As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code. ;-)
The AI tooling is also really, really good at being able to piece together the code, the contextual domain, the documentation, the tests, the related issues/tickets, it could even take the change history into account, and be able to help refresh your memory of unfamiliar code in the context of bugs or new changes you are looking at making.
Whether or not you go to the gym, you are probably going to want to use an excavator if you are going to dig a basement.
https://www.cnbc.com/2025/03/19/ai-art-cannot-be-copyrighted...
Here are cases where the product of AI/ML are not the products of people and not capable of being copyrighted. These are about the OUTPUT being unable to be copyrighted.
It's a good replacement for Google, but probably nothing close to what it's being hyped out to be by the capital allocators.
AI “assistance” is a short intermediate phase, like the “centaurs” that Garry Kasparov was very fond of (human + computer beat both a human and a computer by itself… until the computer-only became better).
If a project allows AI generated contributions, there's a risk that they'll be flooded with low quality contributions that consume human time and resources to review, thus paralyzing the project - it'd be like if you tried to read and reply to every spam email you receive.
So the argument goes that #2 and #3 will not materialize, blanket acceptance of AI contributions will not help projects become more competitive, it will actually slow them down.
Personally I happen to believe that reality will converge somewhere in the middle, you can have a policy which says among other things "be measured in your usage of AI," you can put the emphasis on having contributors do other things like pass unit tests, and if someone gets spammy you can ban them. So I don't think AI is going to paralyze projects but I also think its role in effective software development is a bit narrower than a lot of people currently believe...
https://github.com/linsomniac/ganeti/commit/e91766bfb42c67ab...
https://github.com/linsomniac/ganeti/commit/f52f6d689c242e3e...
There is zero evidence so far that AI improves software developer efficiency.
No, just because you had fun vibing with a chatbot doesn't mean you delivered the end product faster. All of the supposed AI software development gains are entirely self-reported based on "vibes". (Remember these are the same people who claimed massive developer efficiency gains from programming in Haskell or Lisp a few years back.)
Note I'm not even touching on the tech debt issue here, but it is also important.
P.S. The hallucination and counting to five problems will never go away. They are intrinsic to the LLM approach.
But we will need to get a lot better at finetuning first. People don't want generalist LLMs, they want "expert systems".
Was your comment tongue-in-cheek? If not, where is this huge mass of AI-generated software?
I feel like we'd be hearing from business that crushed their competition by delivering faster or with fewer people. Where are those businesses?
> But there are also local tools generated
This is really not the same thing as the original claim ("Software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects").
> These are early days of AI-assisted software development.
Are they? Or is this just IBM destroying another acquisition slowly.
Meanwhile the Dotnet Runtime is fully embracing AI. Which people on the outside may laugh at but you have extremely talented engineers like Stephen Toub and David Fowler advocating for it.
So enterprises: next time you have an IBM rep trying to sell you AI services, do yourself a favor and go to any other number of companies out there who are actually serious about helping you build for the future.
And since I am a North Carolina native, here’s to hoping IBM and RedHat get their stuff together.
I don't feel like it's meaningful to discuss the "competitiveness" of a handful of bespoke local or internal tools.
Admitting such is like admitting you are overpaid for your job, and that a 20 USD AI-agent can do better and faster than you for 75% of the work.
Is it easy to admit that you have learnt skills for 10+ years that are progressively already getting replaced by a machine ? (like thousands of jobs in the past).
More and more, developer is going to be a monkey job where your only task is to make sure there is enough coal in the steam machine.
Compilers destroyed the jobs of developers writing assembler code, they had to adapt. They insisted that hand-written assembler was better.
Here is the same, except you write code in natural language. It may not be optimal in all situations but it often gets the job done.
Companies don't care, so if you release something as open source that's relevant to them, "companies will simply take it, modify it, and never release their changes,and charge for it too" - but that is what companies do, that is their very nature, and you knew that when you first opened the source.
You also knew that when you picked a license, and it's a major reason for the particular choice you made. Want to force companies to share? Pick GPL.
If you decide to yoke a dragon, and it instead snatches your shiny lure and flies away to its cave, you don't get to complain that the dragon isn't playing nice and doesn't want to become your beast of burden. If you picked MIT as your license, that's on you.
The very knowledge that an organization is experiencing hyper acceleration due to its successful adoption of AI across the enterprise is proprietary.
There are no HBS case studies about businesses that successfully established and implemented strategic pillars for AI because the pillars were likely written in the past four months.
Yep, and it's not just code. Student essays, funding applications, internal reports, fiction, art...everything that AI touches has this problem that AI outputs look superficially similar to the work of experts.
As if tech part was the major part of getting the product to market.
Those businesses are probably everywhere. They just aren't open about admitting they're using AI to speed up their marketing/product design/programming/project management/graphics design, because a) it's not normal outside some tech startup sphere to brag about how you're improving your internal process, and b) because almost everyone else is doing that too, so it partially cancels out - that is what competition on the market means, and c) admitting to use of AI in current climate is kind of a questionable PR move.
WRT. those who fail to leverage the new tools and are destined to be outcompeted, this process takes extended time, because companies have inertia.
>> But there are also local tools generated
> This is really not the same thing as the original claim
Point is that such wins compound. You get yak shaving done faster by fashioning your own tools on the fly, and it also cuts cost and a huge burden of maintaining relationships with third parties[0]
--
[0] - Because each account you create, each subscription you take, even each online tool you kinda track and hope hope hope won't disappear on you - each such case comes with a cognitive tax of a business relationship you probably didn't want, that often costs you money directly, and that you need to keep track of.
The most recent release includes a MacOS build in a dmg signed by Apple: https://github.com/banagale/FileKitty/releases/tag/v0.2.3
I vibed that workflow just so more people could have access to this tool. It was a pain and it actually took time away from toenail clipping.
And while I didn't lay hands on a guitar much during this period, I did manage to build this while bouncing between playing Civil War tunes on a 3D-printed violin and generating music in Suno for a soundtrack to “Back on That Crust,” the missing and one true spiritual successor to ToeJam & Earl: https://suno.com/song/e5b6dc04-ffab-4310-b9ef-815bdf742ecb
If the definition is past any sort of length, it will hallucinate new properties, change the names, etc. It also has a propensity to start skipping bits of the definitions by adding in comments like "/** more like this here **/"
It may work for you for small YAML files, but beware doing this for larger ones.
Worst part about all that is that it looks right to begin with because the start of the definitions will be correct, but there will be mistakes and stuff missing.
I've got a PoC hanging around where I did something similar by throwing an OpenAPI spec at an AI and telling it to generate some typescript classes because I was being lazy and couldn't be bothered to run it through a formal tool.
Took me a while to notice a lot of the definitions had subtle bugs, properties were missing and it had made a bunch of stuff up.
https://github.com/TeMPOraL/qr-code-generator
Built with Aider and either Sonnet 3.5 or Gemini 2.5 Pro (I forgot to note that down in this project), and recently modified with Claude Code because I had to test it on something.
Getting the first version of this up was literally both faster and easier than finding a QR code generator that I'm sure is not bloated, not bullshit, not loaded with trackers, that's not using shorteners or its own URL (it's always a stupid idea to use URL shorteners you don't control), not showing ads, mining bitcoin and shit, one that my wife can use in her workflow without being distracted too much. Static page, domain I own, a bit of fiddling with LLMs.
What I can't link to is half a dozen single-use tools or faux tools created on the fly as part of working on something. But this happens to me couple times a month.
To anchor another vertex in this parameter space, I found it easier and faster to ask LLM to build me a "breathing timer" (one that counts down N seconds and resets, repeatedly) with analog indicator by requesting it, because a search query to Google/Kagi would be of comparable length, and then I'd have to click on results!
EDIT: Okay, another example:
https://github.com/TeMPOraL/tampermonkey-scripts/blob/master...
It overlays a trivial UI to set up looping over a segment of any YouTube video, and automatically persists the setting by video ID. It solves the trivial annoyance of channel jingles and other bullshit at start/end of videos that I use repeatedly as background music.
This was mostly done zero-shot by Claude, with maybe two or three requests for corrections/extra features, total development time maybe 15 minutes. I use it every day all the time ever since.
You could say, "but SponsorBlock" or whatever, but per what GP wrote, I just needed a small fraction of functionality of the tools I know exist, and it was trivial to generate that with AI.
https://news.ycombinator.com/item?id=44384610
Funny, as the entire thing starts off with "Now, full disclosure, the title is a bit tongue-in-cheek.".
Who is going to "overtake" QEMU, what exactly does that mean, and what will it matter if they are?
No offense, it's really great that you are able to make apps that do exactly what you want, but your examples are not very good to show that "software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects" (as someone else suggested above). Complex real world software is different from pomodoro timers and TODO lists.
I just checked, though, and the code base I'm now working with has eight stackoverflow links. Not all are even written by me, according to quick check with git blame and git log -S..
> it would require speculative copyright owners to disassemble their binaries
I wonder whether AI might be a useful tool for making that easier.
If you have evidence then you can get courts to order disclosure or examination of code.
> And plenty of proprietary software has public domain code in it already.
I am pretty sure there is a significant amount of proprietary code that has FOSS code in it, against license terms (especially GPL and similar).
A lot of proprietary code is now been written using AIs trained on FOSS code, and companies are open about this. It might open an interesting can of worms.
Given the number of people on HN that say they're using for e.g. Cursor, OpenAI, etc. through work, and my experience with workplaces saying 'absolutely you can't use it', I suspect a large amount is being leaked.
OK, but I asked for evidence and people just keep not providing any.
"God is all around you; he just works in mysterious ways"
OK, good luck with that.
For someone using MIT licensed code for training, it still requires a copy of the license and the copyright notice in "copies or substantial portions of the software". SO I guess its fine for a snippet, but if the AI reproduces too much of it, then its in breach.
From the point of view of someone who does not want their code used by an LLM then using GPL code is more likely to be a breach.
Sometimes, an unreasonable dumbass whose only authority comes from corporate heirarchy is needed to mandate the engineers start chipping away at the tasks. If they weren't a dumbass, they'd know the unreasonable thing they're mandating, and if they weren't unreasonable, they wouldn't mandate the someone does it.
I am an an engineer. "Sometimes" could be swapped for "rarely" above, but the point still stands: as much frustration as I have towards those people, they do occasionally lead to the impossible being delivered. But then again, a stopped clock -> twice a day etc.
If I'm having to reread something over and over to understand what they're even trying to accomplish, odds are it's either AI generated or an attempt at sounding smart instead of being constructive.
Choice is good. It means more slop, but also more gold. Figure out how to find the gold.
This is the target group for code generators. All talk but no projects.
It should be possible to build a useful AI code generator for a given programming language solely from the source code for the language itself. Doing so however would require some maturity.
https://github.com/neocotic/qrious
All the hard work was made by humans.
I can do `npm install` without having to pay for AI, thanks.
1. If you come up with something completely new, you are the sole copyright holder.
2. If you take someone else's copyrighted work and transform it, then both of you have a copyright on the derivative work.
So if you write a brand new comic book that includes Darth Vader, you can't sell that without Disney's permission [1]: they have a copyright on Darth Vader, and so your comic book is partly copyrighted by them. But at the same time, they can't sell it without your permission, because you have a copyright on the comic book too.
In the case of Midjourney outputs, my understanding of the current state of the law is this:
1. Only humans can create copyrights
2. So if Midjourney creates an entirely new image that's not derivative of anyone else's work (as defined by long-established copyright law on derivative works), then nobody owns the copyright, and it's in the public domain
3. If Midjourney creates an image that is derived from someone else's work (as defined by long established copyright law on derivative works), then only Disney has a copyright on that derivative work.
And so, in theory, Disney could distribute Darth Vader images you made with Midjourney, unless you can convince the court that you had enough creative influence over them to warrant a copyright.
[1] Yes of course fair use, trying to make a point here
I'd argue that the most impactful software security bugs in the last couple of decades (Heartbleed etc) have been errors of omission, rather than errors of inclusion.
This means LLMs are:
1) producing lots more code to be audited
2) poor at auditing that code for the most impactful class of bugs
That feels like a dangerous combination.
Welcome to the reality of software development. "Works on my machine" is often not good enough to make the cut.
You can't seriously be questioning the meaning of "understand"... That's straight from Jordan B. Peterson's debate playbook which does nothing but devolve the conversation into absurdism, while making the person sound smart.
> I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.
You understand the system as well as any user could. Your tests only prove that the system works in specific scenarios, which may very well satisfy your requirements, but they absolutely do not prove that you understand how the system works internally, nor that the system is implemented with a reliable degree of accuracy, let alone that it's not misbehaving in subtle ways or that it doesn't have security issues that will only become apparent when exposed to the public. All of this might be acceptable for a tool that you built quickly which is only used by yourself or a few others, but it's far from acceptable for any type of production system.
> As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code.
This doesn't match my (~20y) experience at all. Testing is important, particularly more advanced forms like fuzzing, but it's not a failproof method of surfacing bugs. Tests, like any code, can itself have bugs, it can test the wrong things, setup or mock the environment in ways not representative of real world usage, and most importantly, can only cover a limited amount of real world scenarios. Even in teams that take testing seriously, achieving 100% coverage, even for just statements, is seen as counterproductive and as a fool's errand. Deeply thorough testing as seen in projects like SQLite is practically unheard of. Most programmers I've worked with will often only write happy path tests, if they bother writing any at all.
Which isn't to say that code review is the solution. But a human reviewing the code, building a mental model of how it works and how it's not supposed to work, can often catch issues before the code is even deployed. It is at this point that writing a test is valuable, so that that specific scenario is cemented in the checks for the software, and regressions can be avoided.
So I wouldn't say that testing "trumps" reviews, but that it's not a reliable way of detecting bugs, and that both methods should ideally be used together.
I wonder what counts for transformed, is a filter enough or does it have to be more than that?
It can work very well when the higher-up is well informed and does have deep technical experience and understanding. Steve Jobs and Elon Musk are great, well-known examples of this. They've also provided great examples of the same approach mostly failing when applied outside of their areas of deep expertise and understanding.
"Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>"
The rest of the world might decide differently.
If it's that simple, sounds like you've got your solution! Go ahead and take care of it. If it fits V&V and other normal procedures, like passing tests and documentation, then we'll merge it in. Shouldn't be a problem for you since it will only take a moment.
And as long as you're not worried about people in the USA reusing your code then you're all good!
If we're talking about something that neither involving QEMU nor the people behind it, where is the relevance? It's just a rant on AI at that point.
This does not mean that powerful interests abusing copyright with ever increasing terms and enforcement overreach is fair game. It harms common interest.
However, it does mean that abusing copyright from the other side and denouncing the core ideas of IP ownership—which is now sort of in the interest of certain companies (and capital heavily invested in certain fashionable but not yet profitable startups) based around IP expropriation—harms common interest just as well.
An LLM said it, so it must be true.
While the AI we have now is not good enough to make an entire operating system when asked*, if/when they can, the benefits of all the current licensing models evaporate, and it doesn't matter if that model is proprietary with no source, or GPL, or MIT, because by that point anyone else can reproduce your OS for whatever the cost of tokens is without ever touching your code.
But as we're not there yet, I agree with @benlivengood that (most**) OSS projects must treat GenAI code as if it's unusable.
* At least, not a modern OS. I've not tried getting any model to output a tiny OS that would fit in a C64, and while I doubt they can currently do this, it is a bet I might lose, whereas I am confident all models would currently fail at e.g. reproducing Windows XP.
** I think MIT licensed projects can probably use GenAI code, they're not trying to require derivatives to follow the same licence, but I'm not a lawyer and this is just my barely informed opinion from reading the licenses.
> but your examples are not very good to show that "software projects that somehow are 100% human developed will not be competitive with AI assisted or written projects"
Here's the thing though - it's already the case, because I wouldn't create those tools but hand otherwise. I just don't have the time, and they're too personal/edge-case to pay anyone to make them. So the comparison in this case is between 100% human developed non-existent software and AI generated project which exists. The latter wins in every category by default.
Real musicians don’t mix loops they bought.
Real musicians make their own synth patches.
Real musicians build their own instruments.
Real musicians hand-forge every metal component in their instruments.
…
They say real musicians raise goats for the leather for the drum-skins, but I wouldn't know because I haven’t made any music in months and the goats smell funny.
There's two points here:1) even though most of people on here know what npm is, many of us are not web developers and don't really know how to turn a random package into a useful webapp.
2) The AI is faster than googling a finished product that already exists, not just as an NPM package, but as a complete website.
Especially because search results require you to go through all the popups everyone stuffs everywhere because cookies, ads, before you even find out if it was actually a scam where the website you went to first doesn't actually do the right thing (or perhaps *anything*) anyway.
It is also, for many of us, the same price: free.
However, my overall experience I have been thinking about how this is going to be a massive boon to open source. So many patches, so many new tools will be created to streamline getting new packages into repos. Everything can be tested.
Open source is going to be epicly boosted now.
QEMU deciding to sit out from this acceleration is crazy to me, but probably what is going to give Xen/Docker/Podman the lead.
Overall velocity doesn't come from writing a lot more code, or even from writing code especially quickly.
So my question is: if so many people should be bragging to me and celebrating how much better things are, why does it look to me like they are worse and everyone is miserable about it...?
I mean, sure, there's plenty of devs who refuse to use AI, but how many projects rather than individuals are in each category?
And is Microsoft "traditional"? I name them specifically because their CEO claims 20-30% of their new code is AI generated: https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-3...
Also a sufficiently good exponential solver would do the same thing.
I really like this phrasing, particularly in regards to PRs. I think I'll find a way to incorporate this into my projects. Even for smaller, non-critical projects, it's such a distraction to deal with people trying to make "contributions" that they don't clearly understand.
I'm just really confused what people who send LLM content to other people think they are achieving? Like if I wanted an LLM response, I would just prompt the LLM myself, instead of doing it indirectly though another person who copy/pastes back and forth.
I think that needs actual testing. At what time distances is there an effect, and how big is it? Even if there is an effect, it could be small enough that a mild productivity boost from AI is more important.
Here it is invoking the actual zfs commands:
https://github.com/ganeti/ganeti/compare/master...linsomniac...
All the extra python boilerplate just makes it harder to understand IMHO.
If you comment about AI generated code in a thread about qemu (mission-critical project that many industries rely upon), a pomodoro app is not going to do the trick.
And no, it doesn't "show that is possible". qemu is not only more complex, it's a whole different problem space.
He also writes all his emails with chatgpt.
I don't bother reading.
Oddly enough he recently promoted a guy who has been fucking around with LLMs for years instead of working as his right hand man.
You only need to search for “loops goat skin”. You’re butchering the quote and its meaning quite a bit. The widely circulated version is:
> I thought using loops was cheating, so I programmed my own using samples. I then thought using samples was cheating, so I recorded real drums. I then thought that programming it was cheating, so I learned to play drums for real. I then thought using bought drums was cheating, so I learned to make my own. I then thought using premade skins was cheating, so I killed a goat and skinned it. I then thought that that was cheating too, so I grew my own goat from a baby goat. I also think that is cheating, but I’m not sure where to go from here. I haven’t made any music lately, what with the goat farming and all.
It’s not about “real musicians”¹ but a personal reflection on dependencies and abstractions and the nature of creative work and remixing. Your interpretation of it is backwards.
I am sorry, none of your points are made. Makes no sense.
The LLM work sounds dumb, and the suggestion that it made "a qr code generator" is disingenuous. The LLM barely did a frontend for it. Barely.
Regarding the "free" price, read the comment I replied on again:
> Built with Aider and either Sonnet 3.5 or Gemini 2.5 Pro
Paid tools.
It sounds like the author payed for `npm install`, and thinks he's on top of things and being smart.
> by that point anyone else can reproduce your OS for whatever the cost of tokens is without ever touching your code.
Do you think that the cost of tokens will remain low enough once these companies for now operating at loss have to be profitable, and it really is going to be “anyone else”? Or, would it be limited to “big tech” or select few corporations who can pay a non-trivial amount of money to them?
Do you think it would mean they essentially sell GPL’ed code for proprietary use? Would it not affect FOSS, which has been till now partially powered by the promise to contributors that their (often voluntary) work would remain for public benefit?
Do you think someone would create and make public (and gather so much contributor effort) something on the scale Linux, if they knew that it would be open to be scraped by an intermediary who can sell it at whatever price they choose to set to companies that then are free to call it their own and repackage commercially without contributing back, providing their source or crediting the original authors in any way?
By their own admission, this is just kind of OK. They don’t even know how good or bad it is, just that it kind of solved an immediate problem. That’s not how you create sustainable and reliable software. Which is OK, sometimes you just need to crap something out to do a quick job, but that doesn’t really feel like what your parent comment is talking about.
But notably, FOSS development is neither a corporation or stock trading. It is focused on longevity and maintainability.
It's a thought terminating cliche.
That is a big assumption. If everyone were doing that, this wouldn’t be a major issue. But as the curl developer has noted, people are using LLMs without thinking and wasting everyone’s time and resources.
https://www.linkedin.com/posts/danielstenberg_hackerone-curl...
I can attest to that. Just the other day I got a bug report, clearly written with the assistance of an LLM, for software which has been stable and used in several places for years. This person, when faced with an error on their first try, instead of pondering “what am I doing wrong” instead opened a bug report with a “fix”. Of course, they were using the software wrong. They did not follow the very short and simple instructions and essentially invented steps (probably suggested by an LLM) that caused the problem.
Waste of time for everyone involved, and one more notch on the road to causing burnout. Some of the worst kind of users are those who think “bug” means “anything which doesn’t immediately behave the way I thought it would”. LLMs empower them, to the detriment of everyone else.
"Why did you do this insane thing?"
"IDK, claude suggested it and it works."
Very old or old fashioned
I think under current management immigrants have no chance of getting promoted.
The person got upset at me for saying I could not accept such a thing.
There's other examples.
— How about you do it, motherfucker?! If it’s that simple, you do it! And when you can’t, I’ll come down there, push your face on the keyboard, and burn your office to the ground, how about that?
— Well, you don’t have to get mean about it.
— Yeah, I do have to get mean about it. Nothing worse than an ignorant, arrogant, know-it-all.
If Harlan Ellison were a programmer today.
It began as an experiment in AI-assisted app design and a cross-platform “cat these files” utility.
Since then it has picked up:
- Snapshot history (and change flags) for any file selection
- A rendered folder tree that LLMs can digest, with per-prompt ignore filters
- String-based ignore rules for both tree and file output, so prompts stay surgical
My recent focus is making that generated context modular, so additional inputs (logs, design docs, architecture notes) can plug in cleanly. Apple’s new on-device foundation models could pair nicely with that.
The bigger point: most AI tooling hides the exact nature of context. FileKitty puts that step in the open and keeps the programmer in the loop.
I continue to believe LLMs can solve big problems with appropriate context and that intentionality in context prep is important step in evaluating ideas and implementation suggestions found in LLM outputs.
There's a Homebrew build available and I'd be happy to take contributions: https://github.com/banagale/FileKitty
IP disputes aren't trivial, especially for shoestring-funded OSS.
But either way it's not an example of what they wanted.
??
"AI" code generators are still mostly overhyped nonsense that generate incorrect code all the time.