Most active commenters
  • wahnfrieden(6)
  • nullc(4)

←back to thread

728 points freetonik | 12 comments | | HN request time: 0.738s | source | bottom
Show context
Waterluvian ◴[] No.44976790[source]
I’m not a big AI fan but I do see it as just another tool in your toolbox. I wouldn’t really care how someone got to the end result that is a PR.

But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”

replies(16): >>44976860 #>>44976869 #>>44976945 #>>44977015 #>>44977025 #>>44977121 #>>44977142 #>>44977241 #>>44977503 #>>44978050 #>>44978116 #>>44978159 #>>44978240 #>>44978311 #>>44978533 #>>44979437 #
1. wahnfrieden ◴[] No.44976869[source]
You should care. If someone submits a huge PR, you’re going to waste time asking questions and comprehending their intentions if the answer is that they don’t know either. If you know it’s generated and they haven’t reviewed it themselves, you can decide to shove it back into an LLM for next steps rather than expect the contributor to be able to do anything with your review feedback.

Unreviewed generated PRs can still be helpful starting points for further LLM work if they achieve desired results. But close reading with consideration of authorial intent, giving detailed comments, and asking questions from someone who didn't write or read the code is a waste of your time.

That's why we need to know if a contribution was generated or not.

replies(2): >>44977332 #>>44978112 #
2. KritVutGu ◴[] No.44977332[source]
You are absolutely right. AI is just a tool to DDoS maintainers.

Any contributor who was shown to post provably untested patches used to lose credibility. And now we're talking about accommodating people who don't even understand how the patch is supposed to work?

replies(1): >>44977852 #
3. wahnfrieden ◴[] No.44977852[source]
That’s not what I said though. LLM output, even unreviewed and without understanding, can be a useful artifact. I do it all the time - generate code, try running it, and then if I see it works well, I can decide to review it and follow up with necessary refactoring before integrating it. Parts of that can be contributed too. We’re just learning new etiquettes for doing that productively, and that does includes testing the PR btw (even if the code itself is not understood or reviewed).

Example where this kind of contribution was accepted and valuable, inside this ghostty project https://x.com/mitchellh/status/1957930725996654718

replies(1): >>44978129 #
4. nullc ◴[] No.44978112[source]
> is that they don’t know either

It would be nice if they did, in fact, say they didn't know. But more often they just waste your time making their chatbot argue with you. And the chatbots are outrageous gaslighters.

All big OSS projects have had the occasional bullshitter/gaslighter show up. But LLMs have increased the incidence level of these sorts of contributors by many orders of magnitude-- I consider it an open question if open-public-contribution opensource is viable in the world post LLM.

replies(1): >>44980839 #
5. nullc ◴[] No.44978129{3}[source]
If the AI slop was that valuable a project regular, who actually knows and understands the project, would be just as capable of asking the AI to produce it.
replies(1): >>44980012 #
6. wahnfrieden ◴[] No.44980012{4}[source]
Not according to ghostty maintainer Hashimoto per above.

It takes attempts, verifying the result behaves as desired, and iterative prompting to adjust. And it takes a lot of time to wait on agents in between those steps (this work isn’t a one shot response). You’re being reductive.

replies(1): >>44980154 #
7. nullc ◴[] No.44980154{5}[source]
We may be talking cross purposes. I read the grandparent poster discussing provably untested patches.

I have no clue in ghostty but I've seen plenty of stuff that doesn't compile much less pass tests. And I assert there is nothing but negative value in such "contributions".

If real effort went into it, then maybe there is value-- though it's not clear to me: When a project regular does the same work then at least they know the process. Like if there is some big PR moving things around at least the author knows that it's unlikely to slip in a backdoor. Once the change is reduced to some huge diff, it's much harder to gain this confidence.

In some projects direct PRs for programmatic mass renames and such have been prohibited in favor of requiring submission of the script that produces the change, because its easier to review the script carefully. The same may be necessary for AI.

replies(1): >>44980226 #
8. wahnfrieden ◴[] No.44980226{6}[source]
This whole original HN post is about ghostty btw

Having the original prompts (in sequence and across potentially multiple models) can be valuable but is not necessarily useful in replicating the results because of the slot machine nature of it

replies(1): >>44980476 #
9. nullc ◴[] No.44980476{7}[source]
> This whole original HN post is about ghostty btw

Sure though I believe few commenters care much about ghostty specifically and are primarily discussing the policy abstractly!

> because of the slot machine nature of it

One could use deterministically sampled LLMs with exact integer arithmetic... There is nothing fundamental preventing it from being completely reproducible.

replies(1): >>44981041 #
10. kentm ◴[] No.44980839[source]
There was some post that comes to mind of an example of this. Some project had a security issue reported that was not a security issue, and when asking questions it became extremely obvious that someone was just feeding the conversation into an LLM. There was no security issue. I can imagine this is happening more and more as people are trying to slam in LLM generated code everywhere.

Everyone promoting LLMs, especially on HN, claim that they're expertly using them by using artisanal prompts and carefully examining the output but.. I'm honestly skeptical. Sure, some people are doing that (I do it from time to time). But I've seen enough slop to think that more people are throwing around code that they barely understand than these advocates care to admit .

Those same people will swear that they did due diligence, but why would they admit otherwise? And do they even know what proper due diligence is? And would they still be getting their mythical 30%-50% productivity boost if they were actually doing what they claimed they were doing?

And that is a problem. I cannot have a productive code review with someone that does not even understand what their code is actually doing, much less trade offs that were made in an implementation (because they did not consider any trade offs at all and just took what the LLM produced). If they can't have a conversation about the code at all because they didn't bother to read or understand anything about it, then theres nothing I can do except close the PR and tell them to actually do the work this time.

replies(1): >>44981053 #
11. wahnfrieden ◴[] No.44981041{8}[source]
Can't do that with state of the art LLMs and no sign of that changing (as they like to retain control over model behaviors). I would not want to use or contribute to a project that embraces LLMs yet disallows leading models.

Besides, the output of an LLM is not really any more trustworthy (even if reproducible) than the contribution of an anonymous actor. Both require review of outputs. Reproducibility of output from prompt doesn't mean that the output followed a traceable logic such that you can skip a full manual code review as with your mass renaming example. LLMs produce antagonistic output from innocuous prompting from time to time, too.

12. wahnfrieden ◴[] No.44981053{3}[source]
The ghostty creator disagrees re: the productivity of un-reviewed generated PRs: https://x.com/mitchellh/status/1957930725996654718