But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”
But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”
Unreviewed generated PRs can still be helpful starting points for further LLM work if they achieve desired results. But close reading with consideration of authorial intent, giving detailed comments, and asking questions from someone who didn't write or read the code is a waste of your time.
That's why we need to know if a contribution was generated or not.
Any contributor who was shown to post provably untested patches used to lose credibility. And now we're talking about accommodating people who don't even understand how the patch is supposed to work?
Example where this kind of contribution was accepted and valuable, inside this ghostty project https://x.com/mitchellh/status/1957930725996654718
It would be nice if they did, in fact, say they didn't know. But more often they just waste your time making their chatbot argue with you. And the chatbots are outrageous gaslighters.
All big OSS projects have had the occasional bullshitter/gaslighter show up. But LLMs have increased the incidence level of these sorts of contributors by many orders of magnitude-- I consider it an open question if open-public-contribution opensource is viable in the world post LLM.
It takes attempts, verifying the result behaves as desired, and iterative prompting to adjust. And it takes a lot of time to wait on agents in between those steps (this work isn’t a one shot response). You’re being reductive.
I have no clue in ghostty but I've seen plenty of stuff that doesn't compile much less pass tests. And I assert there is nothing but negative value in such "contributions".
If real effort went into it, then maybe there is value-- though it's not clear to me: When a project regular does the same work then at least they know the process. Like if there is some big PR moving things around at least the author knows that it's unlikely to slip in a backdoor. Once the change is reduced to some huge diff, it's much harder to gain this confidence.
In some projects direct PRs for programmatic mass renames and such have been prohibited in favor of requiring submission of the script that produces the change, because its easier to review the script carefully. The same may be necessary for AI.
Having the original prompts (in sequence and across potentially multiple models) can be valuable but is not necessarily useful in replicating the results because of the slot machine nature of it
Sure though I believe few commenters care much about ghostty specifically and are primarily discussing the policy abstractly!
> because of the slot machine nature of it
One could use deterministically sampled LLMs with exact integer arithmetic... There is nothing fundamental preventing it from being completely reproducible.
Everyone promoting LLMs, especially on HN, claim that they're expertly using them by using artisanal prompts and carefully examining the output but.. I'm honestly skeptical. Sure, some people are doing that (I do it from time to time). But I've seen enough slop to think that more people are throwing around code that they barely understand than these advocates care to admit .
Those same people will swear that they did due diligence, but why would they admit otherwise? And do they even know what proper due diligence is? And would they still be getting their mythical 30%-50% productivity boost if they were actually doing what they claimed they were doing?
And that is a problem. I cannot have a productive code review with someone that does not even understand what their code is actually doing, much less trade offs that were made in an implementation (because they did not consider any trade offs at all and just took what the LLM produced). If they can't have a conversation about the code at all because they didn't bother to read or understand anything about it, then theres nothing I can do except close the PR and tell them to actually do the work this time.
Besides, the output of an LLM is not really any more trustworthy (even if reproducible) than the contribution of an anonymous actor. Both require review of outputs. Reproducibility of output from prompt doesn't mean that the output followed a traceable logic such that you can skip a full manual code review as with your mass renaming example. LLMs produce antagonistic output from innocuous prompting from time to time, too.