←back to thread

371 points ulrischa | 4 comments | | HN request time: 0.613s | source
1. not2b ◴[] No.43235837[source]
If the hallucinated code doesn't compile (or in an interpreted language, immediately throws exceptions), then yes, that isn't risky because that code won't be used. I'm more concerned about code that appears to work for some test cases but solves the wrong problem or inadequately solves the problem, and whether we have anyone on the team who can maintain that code long-term or document it well enough so others can.
replies(2): >>43235865 #>>43237349 #
2. t14n ◴[] No.43235865[source]
fwiw this problem already exists with my more junior co-workers. and also my own code that I write when exhausted!

if you have trusted processes for review and aren't always rushing out changes without triple checking your work (plus a review from another set of eyes), then I think you catch a lot of the subtler bugs that are emitted from an LLM.

replies(1): >>43244565 #
3. wavemode ◴[] No.43237349[source]
I once submitted some code for review, in which the AI had inserted a recursive call to the same function being defined. The recursive call was completely unnecessary and completely nonsensical, but also not wrong per se - it just caused the function to repeat what it was doing. The code typechecked, the tests passed, and the line of code was easy to miss while doing a cursory read through the logic. I missed it, the code reviewer missed it, and eventually it merged to production.

Unfortunately there was one particular edge case which caused that recursive call to become an infinite loop, and I was extremely embarrassed seeing that "stack overflow" server error alert come through Slack afterward.

4. not2b ◴[] No.43244565[source]
Yes, code review can catch these things. But code review for more complex issues works better when the submitter can walk the reviewers through the design and explain the details (sometimes the reviewers will catch a flaw in the submitter's reasoning before they spot the issue in the code: it can become clearer that the developer didn't adequately understand the spec or the problem to be solved). If an LLM produced it, a rigorous process will take longer, which reduces the value of using the LLM in the first place.