But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”
But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”
If trust didn't matter, there wouldn't have been a need for the Linux Kernel team to ban the University of Minnesota for attempting to intentionally smuggle bugs through the PR process as part of an unauthorized social experiment. As it stands, if you / your PRs can't be trusted, they should not even be admitted to the review process.
No you don’t. You can’t outsource trust determinations. Especially to the people you claim not to trust!
You make the judgement call by looking at the code and your known history of the contributor.
Nobody cares if contributors use an LLM or a magnetic needle to generate code. They care if bad code gets introduced or bad patches waste reviewers’ time.
That’s exactly opposite of what the author is saying. He mentions that [if the code is not good, or you are a beginner] he will help you get to finish line, but if it’s LLM code, he shouldn’t be putting effort because there’s no human on the other side.
It makes sense to me.
Otherwise, what’s the harm in saying AI guides you to the solution if you can attest to it being a good solution?
I don’t get it at all. Feels like modernity is often times just inventing pale shadows of things with more addictive hooks to induce needlessly dependent behavior.
If I just vibe-coded something and haven't looked at the code myself, that seems like a necessary thing to disclose. But beyond that, if the code is well understood and solid, I feel that I'd be clouding the conversation by unnecessarily bringing the tools I used into it. If I understand the code and feel confident in it, whether I used AI or not seems irrelevant and distracting.
This policy is just shoving the real problem under the rug. Generative AI is going to require us to come up with better curation/filtering/selection tooling, in general. This heuristic of "whether or not someone self-disclosed using LLMs" just doesn't seem very useful in the long run. Maybe it's a piece of the puzzle but I'm pretty sure there are more useful ways to sift through PRs than that. Line count differences, for example. Whether it was a person with an LLM or a 10x coder without one, a PR that adds 15000 lines is just not likely to be it.
If you’re unwilling to stop using slop tools, then you don’t get to contribute to some projects, and you need to be accept that.
This is the core problem with AI that makes so many people upset. In the old days, if you get a substantial submission, you know a substantial amount of effort went into it. You know that someone at some point had a mental model of what the submission was. Even if they didn't translate that perfectly, you can still try to figure out what they meant and we're thinking. You know the submitter put forth significant effort. That is a real signal that they are both willing and able to do so to address going forward to address issues you raise.
The existence of AI slop fundamentally breaks these assumptions. That is why we need enforced social norms around disclosure.
Stop trying to equate LLM-generated code with indexing-based autocomplete. They’re not the same thing at all: LLM-generated code is equivalent to code copied off Stack Overflow, which is also something you’d better not be attempting to fraudulently pass off as your own work.
My little essay up there is more so a response to the heated "LLM people vs pure people" comments I'm reading all over this discussion. Some of this stuff just seems entirely misguided and fear driven.
10x engineers create so many bugs without AI, and vibe coding could multiply that to 100x. But let's not distract from the source of that, which is rewarding the false confidence it takes to pretend we understand stuff that we actually don't.
I think you just haven't gotten the hang of it yet, which is fine... the tooling is very immature and hard to get consistent results with. But this isn't a given. Some people do get good, steerable LLM coding setups.
but maybe those don't need to be about "whether or not you used LLMs" and might have more to do
with "how well you understand the code you are opening a PR for" (or are reviewing, for that matter)
AI is a great proxy for how much someone has. If you're writing a PR you're demonstrating some manner of understanding. If you're submitting AI slop you're not.If they had used AI, their PRs might have been more understandable / less buggy, and ultimately I would have preferred that.
The only reason one may not want disclosure is if one can’t write anything by themselves, thus they will have to label all code as AI generated and everyone will see their real skill level.
https://youtu.be/klW65MWJ1PY?t=1320
X sucks and should not be allowed to proceed with what they're doing in Memphis. Nor should Meta be allowed to proceed with multiple Manhattan sized data centers.
That's a pretty nice offer from one of the most famous and accomplished free software maintainers in the world. He's promising not to take a short-cut reviewing your PR, in exchange for you not taking a short-cut writing it in the first place.
It seems a bit like saying you can’t trust a legal document because it was written on a computer with spellcheck, rather than by a $10 an hour temp with a typewriter.
In an open source project I think you have to start with a baseline assumption of "trust nobody." Exceptions possibly if you know the contributors personally, or have built up trust over years of collaboration.
I wouldn't reject or decline to review a PR just because I don't trust the contributor.
LLMs are trained to be steerable at inference time via context/prompting. Fine tuning is also possible and often used. Both count as "feedback" in my book, and my point is that both can be effective at "changing the LLM" in terms of its behavior at inference time.
The PR effectively ends up being an extremely high-latency conversation with an LLM, via another human who doesn't have the full context/understanding of the problem.
This “short cut” language suggests that the quality of the submission is going to be objectively worse by way of its provenance.
Yet, can one reliably distinguish working and tested code generated by a person vs a machine? We’re well past passing Turing tests at this point.
IMO when people declare that LLMs "pass" at a particular skill, it's a sign that they don't have the taste or experience to judge that skill themselves. Or - when it's CEOs - they have an interest in devaluing it.
So yes if you're trying to fool an experienced open source maintainer with unrefined LLM-generated code, good luck (especially one who's said he doesn't want that).
If they had used AI, their PRs might have been more understandable / less buggy, and ultimately I would have preferred that.
Sure, and if they had used AI pigs could depart my rectum on a Part 121 flight. One has absolutely nothing to do with the other. Submitting AI slop does not demonstrate any knowledge of the code in question even if you do understand the code.To address your claim about AI slop improving the output of these mythical 10x coders: doubtful. LLMs can only approximate meaningful output if they've already indexed the solution. If your vaunted 10x coders are working on already solved problems you're likely wasting their time. If they're working on something novel LLMs are of little use. For instance: I've had the pleasure of working with a notoriously poorly documented crate that's also got a reputation for frequently making breaking changes. I used DDG and Google to see if I could track down someone with a similar use case. If I forgot to append "-ai" to the query I'd get back absolutely asinine results typically along the line of "here's an answer with rust and one of the words in your query". At best first sentence would explain something entirely unrelated about the crate.
Potentially LLMs could be improved by ingesting more and more data, but that's an arms race they're destined to lose. People are already turning to Cloudflare and Anubis en masse to avoid being billed for training LLMs. If Altman and co. had to pay market rate for their training data nobody could afford to use these AI doodads.
The person driving the LLM is a teachable human who can learn what's what's going on and learn to improve the code. It's simply not true that there's no person on the other side of the PR.
The idea that we should be comparing "teaching a human" to "teaching an LLM" is yet another instance of this false equivalence.
It's not inherently pointless to provide feedback on a PR with code written using an LLM, that feedback goes to the person using the LLM tools.
People are swallowing this b.s. marketing mystification of "LLMs as non human entities". But really they're fancy compilers that we have a lot to learn about.
For example, you either make your contributors attest that their changes are original or that they have the right to contribute their changes—or you assume this of them and consider it implicit in their submission.
What you (probably) don’t do is welcome contributions that the contributors do not have the right to make.
Assuring you didn’t include any AGPLv3 code in your contribution is exactly the same kind of assurance. It also doesn’t provide any provenance.
Conflating assurance with provenance is bogus because the former is about making a representation that, if false, exposes the person making it to liability. For most situations that’s sufficient that provenance isn’t needed.
Presumably if a contributor repeatedly made bad PRs that didn't do what they said, introduced bugs, scribbled pointlessly on the codebase, and when you tried to coach or clarify at best they later forgot everything you said and at worst outright gaslit and lied to you about their PRs... you would reject or decline to review their PRs, right? You'd presumably ban the outright.
Well that's exactly what commercial LLM products, with the aid of less sophisticated users, have already done to the maintainers of many large open source projects. It's not that they're not trusted-- they should be distrusted with ample cause.
So what if the above banned contributor kept getting other people to mindlessly submit their work and even proxy communication through -- evading your well earned distrust and bans? Asking people to at least disclose that they were acting on behalf of the distrusted contributor would be the least you would do, I hope? Or even asking them to disclose if and to what extent their work was a collaboration with a distrusted contributor?
IF they disclose what they've done, provided the prompts, etc. then other contributors can help them get better results from the tools. But the feedback is very different than the feedback you'd give a human that actually wrote the code in question, that latter feedback is unlikely to be of much value (and even less likely to persist).
Thanks for putting it so well.
That is what hurts. A lot. Taking pride out of work, especially creative work, makes the world a worse place; it makes life less worth living.
> inventing pale shadows of things
Yes.
For one: it threatens to make an entire generation of programmers lazy and stupid. They stop exercising their creative muscle. Writing and reviewing are different activities; both should be done continuously.
This is perfectly observable with a foreign language. If you stop actively using a foreign language after learning it really well, your ability to speak it fades pretty quickly, while your ability to understand it fades too, but less quickly.
Exactly! The code used double as "proof of work". Well-formed language used to double as "proof of thinking". And that's what AI breaks: it speaks, but doesn't think. And my core point is that language that does not originate from well-reasoned human effort (i.e., from either writing the language directly, or from writing such code manually that generates the language deterministically, and for known reasons/intents), does not deserve human attention. Even if the "observable behavior" of such language (when executed as code) looks "alright".
And because I further think that no code should be accepted without human review (which excludes both not reviewing AI-generated code at all and having some other AI review the AI-generated code), I conclude that AI-generated code can never be accepted.
Such behaviors can only be normalized in a classroom / ramp-up / mentorship-like setting. Which is very valid, BUT:
- Your reviewers are always overloaded, so they need some official mandate / approval to mentor newcomers. This is super important, and should be done everywhere.
- Even with the above in place: because you're being mentored with great attention to detail, you owe it to your reviewer not to drown them in AI slop. You must honor them by writing every single line that you ask them to spend their attention on yourself. Ultimately, their educative efforts are invested IN YOU, not (only) in the code that may finally be merged. I absolutely refuse to review or otherwise correct AI slop, while at the same time I'm 100% committed to transfer whatever knowledge I may have to another human.
Fuck AI.