But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”
But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”
Unreviewed generated PRs can still be helpful starting points for further LLM work if they achieve desired results. But close reading with consideration of authorial intent, giving detailed comments, and asking questions from someone who didn't write or read the code is a waste of your time.
That's why we need to know if a contribution was generated or not.
Any contributor who was shown to post provably untested patches used to lose credibility. And now we're talking about accommodating people who don't even understand how the patch is supposed to work?
Example where this kind of contribution was accepted and valuable, inside this ghostty project https://x.com/mitchellh/status/1957930725996654718
It takes attempts, verifying the result behaves as desired, and iterative prompting to adjust. And it takes a lot of time to wait on agents in between those steps (this work isn’t a one shot response). You’re being reductive.
I have no clue in ghostty but I've seen plenty of stuff that doesn't compile much less pass tests. And I assert there is nothing but negative value in such "contributions".
If real effort went into it, then maybe there is value-- though it's not clear to me: When a project regular does the same work then at least they know the process. Like if there is some big PR moving things around at least the author knows that it's unlikely to slip in a backdoor. Once the change is reduced to some huge diff, it's much harder to gain this confidence.
In some projects direct PRs for programmatic mass renames and such have been prohibited in favor of requiring submission of the script that produces the change, because its easier to review the script carefully. The same may be necessary for AI.
Having the original prompts (in sequence and across potentially multiple models) can be valuable but is not necessarily useful in replicating the results because of the slot machine nature of it
Sure though I believe few commenters care much about ghostty specifically and are primarily discussing the policy abstractly!
> because of the slot machine nature of it
One could use deterministically sampled LLMs with exact integer arithmetic... There is nothing fundamental preventing it from being completely reproducible.
Besides, the output of an LLM is not really any more trustworthy (even if reproducible) than the contribution of an anonymous actor. Both require review of outputs. Reproducibility of output from prompt doesn't mean that the output followed a traceable logic such that you can skip a full manual code review as with your mass renaming example. LLMs produce antagonistic output from innocuous prompting from time to time, too.