←back to thread

728 points freetonik | 8 comments | | HN request time: 0s | source | bottom
Show context
Waterluvian ◴[] No.44976790[source]
I’m not a big AI fan but I do see it as just another tool in your toolbox. I wouldn’t really care how someone got to the end result that is a PR.

But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”

replies(16): >>44976860 #>>44976869 #>>44976945 #>>44977015 #>>44977025 #>>44977121 #>>44977142 #>>44977241 #>>44977503 #>>44978050 #>>44978116 #>>44978159 #>>44978240 #>>44978311 #>>44978533 #>>44979437 #
wahnfrieden ◴[] No.44976869[source]
You should care. If someone submits a huge PR, you’re going to waste time asking questions and comprehending their intentions if the answer is that they don’t know either. If you know it’s generated and they haven’t reviewed it themselves, you can decide to shove it back into an LLM for next steps rather than expect the contributor to be able to do anything with your review feedback.

Unreviewed generated PRs can still be helpful starting points for further LLM work if they achieve desired results. But close reading with consideration of authorial intent, giving detailed comments, and asking questions from someone who didn't write or read the code is a waste of your time.

That's why we need to know if a contribution was generated or not.

replies(2): >>44977332 #>>44978112 #
1. KritVutGu ◴[] No.44977332[source]
You are absolutely right. AI is just a tool to DDoS maintainers.

Any contributor who was shown to post provably untested patches used to lose credibility. And now we're talking about accommodating people who don't even understand how the patch is supposed to work?

replies(1): >>44977852 #
2. wahnfrieden ◴[] No.44977852[source]
That’s not what I said though. LLM output, even unreviewed and without understanding, can be a useful artifact. I do it all the time - generate code, try running it, and then if I see it works well, I can decide to review it and follow up with necessary refactoring before integrating it. Parts of that can be contributed too. We’re just learning new etiquettes for doing that productively, and that does includes testing the PR btw (even if the code itself is not understood or reviewed).

Example where this kind of contribution was accepted and valuable, inside this ghostty project https://x.com/mitchellh/status/1957930725996654718

replies(1): >>44978129 #
3. nullc ◴[] No.44978129[source]
If the AI slop was that valuable a project regular, who actually knows and understands the project, would be just as capable of asking the AI to produce it.
replies(1): >>44980012 #
4. wahnfrieden ◴[] No.44980012{3}[source]
Not according to ghostty maintainer Hashimoto per above.

It takes attempts, verifying the result behaves as desired, and iterative prompting to adjust. And it takes a lot of time to wait on agents in between those steps (this work isn’t a one shot response). You’re being reductive.

replies(1): >>44980154 #
5. nullc ◴[] No.44980154{4}[source]
We may be talking cross purposes. I read the grandparent poster discussing provably untested patches.

I have no clue in ghostty but I've seen plenty of stuff that doesn't compile much less pass tests. And I assert there is nothing but negative value in such "contributions".

If real effort went into it, then maybe there is value-- though it's not clear to me: When a project regular does the same work then at least they know the process. Like if there is some big PR moving things around at least the author knows that it's unlikely to slip in a backdoor. Once the change is reduced to some huge diff, it's much harder to gain this confidence.

In some projects direct PRs for programmatic mass renames and such have been prohibited in favor of requiring submission of the script that produces the change, because its easier to review the script carefully. The same may be necessary for AI.

replies(1): >>44980226 #
6. wahnfrieden ◴[] No.44980226{5}[source]
This whole original HN post is about ghostty btw

Having the original prompts (in sequence and across potentially multiple models) can be valuable but is not necessarily useful in replicating the results because of the slot machine nature of it

replies(1): >>44980476 #
7. nullc ◴[] No.44980476{6}[source]
> This whole original HN post is about ghostty btw

Sure though I believe few commenters care much about ghostty specifically and are primarily discussing the policy abstractly!

> because of the slot machine nature of it

One could use deterministically sampled LLMs with exact integer arithmetic... There is nothing fundamental preventing it from being completely reproducible.

replies(1): >>44981041 #
8. wahnfrieden ◴[] No.44981041{7}[source]
Can't do that with state of the art LLMs and no sign of that changing (as they like to retain control over model behaviors). I would not want to use or contribute to a project that embraces LLMs yet disallows leading models.

Besides, the output of an LLM is not really any more trustworthy (even if reproducible) than the contribution of an anonymous actor. Both require review of outputs. Reproducibility of output from prompt doesn't mean that the output followed a traceable logic such that you can skip a full manual code review as with your mass renaming example. LLMs produce antagonistic output from innocuous prompting from time to time, too.