I'm very old man shouting at clouds about this stuff. I don't want to review code the author doesn't understand and I don't want to merge code neither of us understand.
I'm very old man shouting at clouds about this stuff. I don't want to review code the author doesn't understand and I don't want to merge code neither of us understand.
This really bothers me. I've had people ask me to do some task except they get AI to provide instructions on how to do the task and send me the instructions, rather than saying "Hey can you please do X". It's insulting.
This is the same people that think that "learning to code" is a translation issue they don't have time for as opposed to experience they don't have.
---
LLM-Generated Contribution Policy
Color is a library full of complex math and subtle decisions (some of them possibly even wrong). It is extremely important that any issues or pull requests be well understood by the submitter and that, especially for pull requests, the developer can attest to the Developer Certificate of Origin for each pull request (see LICENCE).
If LLM assistance is used in writing pull requests, this must be documented in the commit message and pull request. If there is evidence of LLM assistance without such declaration, the pull request will be declined.
Any contribution (bug, feature request, or pull request) that uses unreviewed LLM output will be rejected.
---
I am also adding this to my `SECURITY.md` entries:
---
LLM-Generated Security Report Policy
Absolutely no security reports will be accepted that have been generated by LLM agents.
---
As it's mostly just me, I'm trying to strike a balance, but my preference is against LLM generated contributions.
My high-level work is absolutely impossible to delegate to AI, but AI really helps with tedious or low-stakes incidental tasks. The other day I asked Claude Code to wire up some graphs and outlier analysis for some database benchmark result CSVs. Something conceptually easy, but takes a fair bit of time to figure out libraries and get everything hooked up unless you're already an expert at csv processing.
This is very, very germane and a very quotable line. And these people have been around from long before LLMs appeared. These are the people who dash off an incomplete idea on Friday afternoon and expect to see a finished product in production by next Tuesday, latest. They have no self-awareness of how much context and disambiguation is needed to go from "idea in my head" to working, deterministic software that drives something like a process change in a business.
I know several people like this, and it seems they feel like they have god powers now - and that they alone can communicate with "the AI" in this way that is simply unreachable by the rest of the peasants.
But I refuse to use it as anything more than a fancy autocomplete. If it suggests code that's pretty close to what I was about to type anyway, I accept it.
This ensures that I still understand my code, that there shouldn't be any hallucination derived bugs, [1] and there really shouldn't be any questions about copyright if I was about to type it.
I find using copilot this way speeds me up. Not really because my typing is slow, it's more that I have a habit of getting bored and distracted while typing. Copilot helps me get to the next thinking/debugging part sooner.
My brain really comprehend the idea that anyone would not want to not understand their code. Especially if they are going to submit it as a PR.
And I'm a little annoyed that the existence of such people is resulting in policies that will stop me from using LLMs as autocomplete when submitting to open source projects.
I have tried using copilot in other ways. I'd love for it to be able to do menial refactoring tasks for me. But every-time I experiment, it seems to fall off the rails so fast. Or it just ends up slower than what I could do manually because it has to re-generate all my code instead of just editing it.
[1] Though I find it really interesting that if I'm in the middle of typing a bug, copilot is very happy to autocomplete it in its buggy form. Even when the bug is obvious from local context, like I've typoed a variable name.
In free software though, these kinds of nonsense suggestions always happened, way before AI. Just look at any project mailing list.
It is expected that any new suggestion will encounter some resistance, the new contributor itself should be aware of that. For serious projects specifically, the levels of skepticism are usually way higher than corporations, and that's healthy and desirable.
The code example was AI generated. I couldn't find a single line of code anywhere in any codebase. 0 examples on GitHub.
And of course it didn't work.
But, it sent me on a wild goose because I trusted this person to give me a valuable insight. It pisses me off so much.
AI further encourages the problem in DevOps/Systems Engineering/SRE where someone comes to you and says "hey can you do this for me" having come up with the solution instead of giving you the problem "hey can you help me accomplish this"... AI gives them solutions which is more steps away to detangle into what really needs to be done.
AI has knowledge, but it doesn't have taste. Especially when it doesn't have all of the context a person with experience, it just has bad taste in solutions or just the absence of taste but with the additional problem that it makes it much easier for people to do things.
Permissions on what people have access to read and permission to change is now going to have to be more restricted because not only are we dealing with folks who have limited experience with permissions, now we have them empowered by AI to do more things which are less advisable.
A far too common trap people fall into is the fallacy of "your job is easy as all you have to do is <insert trivialization here>, but my job is hard because ..."
Statistically generated text (token) responses constructed by LLM's to simplistic queries are an accelerant to the self-aggrandizing problem.
Sometimes it's fun reverse engineering the directions back into various forum, Stack Overflow, and documentation fragments and pointing out how AI assembled similar things into something incorrect
You can’t dismiss it out of hand (especially with it coming from up the chain) but it takes no time at all to generate by someone who knows nothing about the problem space (or worse, just enough to be dangerous) and it could take hours or more to debunk/disprove the suggestion.
I don’t know what to call this? Cognitive DDOS? Amplified Plausibility Attack? There should be a name for it and it should be ridiculed.
I would find it very insulting if someone did this to me, for sure, as well as a huge waste of my time.
On the other hand I've also worked with some very intransigent developers who've actively fought against things they simply didn't want to do on flimsy technical grounds, knowing it couldn't be properly challenged by the requester.
On yet another hand, I've also been subordinate to people with a small amount of technical knowledge -- or a small amount of knowledge about a specific problem -- who'll do the exact same thing without ChatGPT: fire a bunch of mid-wit ideas downstream that you have already thought about, but you then need to spend a bunch of time explaining why their hot-takes aren't good. Or the CEO of a small digital agency I worked at circa 2004 asking us if we'd ever considered using CSS for our projects (which were of course CSS heavy).
I get that. But the AI tooling when guided by a competent human can generate some pretty competent code, a lot of it can be driven entirely through natural language instructions. And every few months, the tooling is getting significantly more capable.
I'm contemplating what exactly it means to "understand" the code though. In the case of one project I'm working on, it's an (almost) entirely vibe-coded new storage backend to an existing VM orchestration system. I don't know the existing code base. I don't really have the time to have implemented it by hand (or I would have done it a couple years ago).
But, I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.
As an open source maintainer myself, I can imagine (thankfully I haven't been hit with it myself) how frustrating getting all sorts of low quality LLM "slop" submissions could be. I also understand that I'm going to have to review the code coming in whether or not the author of the submission understands it.
So how, as developers, do we leverage these tools as appropriate, and signal to other developers the level of quality in code. As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code. ;-)
If the definition is past any sort of length, it will hallucinate new properties, change the names, etc. It also has a propensity to start skipping bits of the definitions by adding in comments like "/** more like this here **/"
It may work for you for small YAML files, but beware doing this for larger ones.
Worst part about all that is that it looks right to begin with because the start of the definitions will be correct, but there will be mistakes and stuff missing.
I've got a PoC hanging around where I did something similar by throwing an OpenAPI spec at an AI and telling it to generate some typescript classes because I was being lazy and couldn't be bothered to run it through a formal tool.
Took me a while to notice a lot of the definitions had subtle bugs, properties were missing and it had made a bunch of stuff up.
Sometimes, an unreasonable dumbass whose only authority comes from corporate heirarchy is needed to mandate the engineers start chipping away at the tasks. If they weren't a dumbass, they'd know the unreasonable thing they're mandating, and if they weren't unreasonable, they wouldn't mandate the someone does it.
I am an an engineer. "Sometimes" could be swapped for "rarely" above, but the point still stands: as much frustration as I have towards those people, they do occasionally lead to the impossible being delivered. But then again, a stopped clock -> twice a day etc.
You can't seriously be questioning the meaning of "understand"... That's straight from Jordan B. Peterson's debate playbook which does nothing but devolve the conversation into absurdism, while making the person sound smart.
> I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.
You understand the system as well as any user could. Your tests only prove that the system works in specific scenarios, which may very well satisfy your requirements, but they absolutely do not prove that you understand how the system works internally, nor that the system is implemented with a reliable degree of accuracy, let alone that it's not misbehaving in subtle ways or that it doesn't have security issues that will only become apparent when exposed to the public. All of this might be acceptable for a tool that you built quickly which is only used by yourself or a few others, but it's far from acceptable for any type of production system.
> As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code.
This doesn't match my (~20y) experience at all. Testing is important, particularly more advanced forms like fuzzing, but it's not a failproof method of surfacing bugs. Tests, like any code, can itself have bugs, it can test the wrong things, setup or mock the environment in ways not representative of real world usage, and most importantly, can only cover a limited amount of real world scenarios. Even in teams that take testing seriously, achieving 100% coverage, even for just statements, is seen as counterproductive and as a fool's errand. Deeply thorough testing as seen in projects like SQLite is practically unheard of. Most programmers I've worked with will often only write happy path tests, if they bother writing any at all.
Which isn't to say that code review is the solution. But a human reviewing the code, building a mental model of how it works and how it's not supposed to work, can often catch issues before the code is even deployed. It is at this point that writing a test is valuable, so that that specific scenario is cemented in the checks for the software, and regressions can be avoided.
So I wouldn't say that testing "trumps" reviews, but that it's not a reliable way of detecting bugs, and that both methods should ideally be used together.
It can work very well when the higher-up is well informed and does have deep technical experience and understanding. Steve Jobs and Elon Musk are great, well-known examples of this. They've also provided great examples of the same approach mostly failing when applied outside of their areas of deep expertise and understanding.
If it's that simple, sounds like you've got your solution! Go ahead and take care of it. If it fits V&V and other normal procedures, like passing tests and documentation, then we'll merge it in. Shouldn't be a problem for you since it will only take a moment.
An LLM said it, so it must be true.
I really like this phrasing, particularly in regards to PRs. I think I'll find a way to incorporate this into my projects. Even for smaller, non-critical projects, it's such a distraction to deal with people trying to make "contributions" that they don't clearly understand.
I'm just really confused what people who send LLM content to other people think they are achieving? Like if I wanted an LLM response, I would just prompt the LLM myself, instead of doing it indirectly though another person who copy/pastes back and forth.
He also writes all his emails with chatgpt.
I don't bother reading.
Oddly enough he recently promoted a guy who has been fucking around with LLMs for years instead of working as his right hand man.
I think under current management immigrants have no chance of getting promoted.
— How about you do it, motherfucker?! If it’s that simple, you do it! And when you can’t, I’ll come down there, push your face on the keyboard, and burn your office to the ground, how about that?
— Well, you don’t have to get mean about it.
— Yeah, I do have to get mean about it. Nothing worse than an ignorant, arrogant, know-it-all.
If Harlan Ellison were a programmer today.
The types don't exist outside of the yaml/json/etc.
You can't check them.