Hallucinations in code are the least dangerous form of LLM mistakes

1. Terr_ ◴[03 Mar 25 03:34 UTC] No.43238043[source]▶

[Recycled from an older dupe submission]

As much as I've agreed with the author's other posts/takes, I find myself resisting this one:

> I'll finish this rant with a related observation: I keep seeing people say “if I have to review every line of code an LLM writes, it would have been faster to write it myself!”

> Those people are loudly declaring that they have under-invested in the crucial skills of reading, understanding and reviewing code written by other people.

No, that does not follow.

1. Reviewing depends on what you know about the expertise (and trust) of the person writing it. Spending most of your day reviewing code written by familiar human co-workers is very different from the same time reviewing anonymous contributions.

2. Reviews are not just about the code's potential mechanics, but inferring and comparing the intent and approach of the writer. For LLMs, that ranges between non-existent and schizoid, and writing it yourself skips that cost.

3. Motivation is important, for some developers that means learning, understanding and creating. Not wanting to do code reviews all day doesn't mean you're bad at them. Also, reviewing an LLM's code has no social aspect.

However you do it, somebody else should still be reviewing the change afterwards.

replies(6): >>43240863 #>>43241052 #>>43241581 #>>43243540 #>>43243749 #>>43244380 #

2. theshrike79 ◴[03 Mar 25 11:52 UTC] No.43240863[source]▶

>>43238043 (TP) #

You can see the patterns a.k.a. "code smells"[0] in code 20x faster than you can write code yourself.

I can browse through any Java/C#/Go code and without actually reading every keyword see how it flows and if there's something "off" about how it's structured. And if I smell something I can dig down further and see what's cooking.

If your chosen language is difficult/slow to read, then it's on you.

And stuff should have unit tests with decent coverage anyway, those should be even easier for a human to check, even if the LLM wrote them too.

[0] https://en.wikipedia.org/wiki/Code_smell

replies(2): >>43241314 #>>43245966 #

3. elcritch ◴[03 Mar 25 12:19 UTC] No.43241052[source]▶

>>43238043 (TP) #

> 2. Reviews are not just about the code's potential mechanics, but inferring and comparing the intent and approach of the writer. For LLMs, that ranges between non-existent and schizoid, and writing it yourself skips that cost.

With humans you can be reasonably sure they've followed through with a mostly consistent level of care and thouhht. LLMs will just outright lie to make their jobs easier in one section while in another area generate high quality code.

I've had to do a 'git reset --hard' after trying out the Claude code and spending $20 bucks. It always seems great at first, but it just becomes non-sense on larger changes. Maybe chain of thought models do better though.

replies(3): >>43241485 #>>43242421 #>>43246504 #

4. skywhopper ◴[03 Mar 25 13:00 UTC] No.43241314[source]▶

>>43240863 #

Wow, what a wildly simplistic view you have of programming. “Code smells” (god, I hate that term) are not the only thing that can be wrong. Unit tests only cover what they cover. Reviewing the code is only one piece of the overall cost here.

5. boesboes ◴[03 Mar 25 13:25 UTC] No.43241485[source]▶

>>43241052 #

I did the exact same today! It started out reasonable, but as you iterate on the commits/PR it become complete crap. And expensive too for very little value.

6. Eridrus ◴[03 Mar 25 13:38 UTC] No.43241581[source]▶

>>43238043 (TP) #

Yeah, I strongly disagree with this too.

I've spent a lot of time reviewing code and doing code audits for security (far more than the average engineer) and reading code still takes longer than writing it, particularly when it is dense and you cannot actually trust the comments and variable names to be true.

AI is completely untrustable in that sense. The English and code have no particular reason to align so you really need to read the code itself.

These models may also use unfamiliar idioms where you don't know the edge cases where you either have to fight the model to do it a different way, or go investigate the idiom and think through the edge cases if you really want to understand it.

I think most people just don't read the code these models produce at all and just click accept and then just see if tests pass or just look at the output manually.

I am still trying to give it a go, and sometimes it really does make things easier on simpler tasks and I am blown away, and it has been getting better, but I feel like I need to set myself a hard timeout with these tools where if they haven't done basically what I wanted quickly, I should just start from scratch since the task is beyond them and I'll spend more time on the back and forth.

They are useful for giving me the motivation to do things that I'm avoiding because they're too boring though because after fighting with them for 20 minutes I'm ready to go write the code.

7. aaronbaugher ◴[03 Mar 25 15:06 UTC] No.43242421[source]▶

>>43241052 #

It's like cutting and pasting from Stack Overflow, if SO didn't have a voting system to give you some hope that the top answer at least works and wasn't hallucinated by someone who didn't understand the question.

I asked Gemini for the lyrics of a song that I knew was on all the main lyrics sites. It gave me the lyrics to a different song with the same title. On the second try, it hallucinated a batch of lyrics. Third time, I gave it a link to the correct lyrics, and it "lied" and said it had consulted that page to get it right but gave me another wrong set.

It did manage to find me a decent recipe for chicken salad, but I certainly didn't make it without checking to make sure the ingredients and ratios looked reasonable. I wouldn't use code from one of these things without closely inspecting every line, which makes it a pointless exercise.

replies(1): >>43242540 #

8. simonw ◴[03 Mar 25 15:17 UTC] No.43242540{3}[source]▶

>>43242421 #

I'm pretty sure Gemini (and likely other models too) have been deliberately engineered to avoid outputting exact lyrics, because the LLM labs know that the music industry is extremely litigious.

I'm surprised it didn't outright reject your request to be honest.

replies(1): >>43244362 #

9. mcpar-land ◴[03 Mar 25 16:38 UTC] No.43243540[source]▶

>>43238043 (TP) #

the part of their claim that does the heavy lifting is "code written by other people" - LLM-produced code does not fall into that category. LLM code is not written by anyone. There was no model in a brain I can empathize with and think about why they might have done this decision or that decision, or a person I can potentially contact and do code review with.

10. saghm ◴[03 Mar 25 16:53 UTC] No.43243749[source]▶

>>43238043 (TP) #

The crux of this seems to be that "reviewing code written by other people" isn't the same as "reviewing code written by LLMs". The "human" element of human-written code allows you to utilize social knowledge as well as technical, and that can even be built up over time when reviewing the same persons' code. Maybe there's some equivalent of this that people can develop when dealing with LLM code, but I don't think many people have it now (if it even does exist), and I don't even know what it would look like.

11. aaronbaugher ◴[03 Mar 25 17:33 UTC] No.43244362{4}[source]▶

>>43242540 #

I wondered if it'd been banned from looking at those sites. If that's commonly known (I've only started dabbling in this stuff, so I wasn't aware of that), it's interesting that it didn't just tell me it couldn't do that, instead of lying and giving false info.

replies(1): >>43244461 #

12. lsy ◴[03 Mar 25 17:34 UTC] No.43244380[source]▶

>>43238043 (TP) #

I'm also put off by the author's condescension towards people who aren't convinced after using the technology. It's not the user's job to find a product useful, it's a product's job to be useful for the user. If a programmer puts a high value on being able to trust a program's output to be minimally conformant to libraries and syntax that are literally available to the program, and places a high value on not having to babysit every line of code that you review and write, that's the programmer's prerogative in their profession, not some kind of moral failing.

13. krupan ◴[03 Mar 25 17:42 UTC] No.43244461{5}[source]▶

>>43244362 #

"it's interesting that it didn't just tell me it couldn't do that, instead of lying and giving false info."

Interesting is a very kind word to use there

14. theshrike79 ◴[03 Mar 25 19:57 UTC] No.43245966[source]▶

>>43240863 #

-4 points and one reply, what is this, Reddit? The downvote button isn't for "I disagree".

replies(1): >>43246580 #

15. Terr_ ◴[03 Mar 25 20:48 UTC] No.43246504[source]▶

>>43241052 #

> With humans you can be reasonably sure they've followed through with a mostly consistent level of care and thouhht.

And even if they fail, other humans are more likely to fail in ways we are familiar-with and can internally model and anticipate ourselves.

16. throwuxiytayq ◴[03 Mar 25 20:54 UTC] No.43246580{3}[source]▶

>>43245966 #

You’re catching some downvotes, but I agree with your perspective. I’m feeling very productive with LLMs and C# specifically. There’s definitely some LLM outputs that I don’t even bother checking, but very often the code is visibly correct and ready for use. Ensuring that the LLM output conforms to your preferred style (e.g. functional-like with static functions) helps a lot. I usually do a quick formatting/refactoring pass with the double purpose of also understanding and checking the code. In case there’s doubts about correctness (usually in just one or two spots), they can be cleared up very quickly. I’m sure this workflow isn’t a great fit for every language, program type and skill level (there’s experts out there that make me embarrassed!), but reading some people I feel like a lot of my peers are missing out.

replies(1): >>43257415 #

17. SR2Z ◴[04 Mar 25 17:10 UTC] No.43257415{4}[source]▶

>>43246580 #

I think the reason for this gap are the differences in scope and novelty between codebases. When you need an LLM to write a piece of code that's been written a million times before (e.g. "find the normal vector to this plane", "find the highest scoring user") it generally produces decent code.

But on the flip side, this type of code is intrinsically less valuable than novel stuff ("convert this signed distance field to a mesh") which an LLM will choke on.

replies(1): >>43260175 #

18. throwuxiytayq ◴[04 Mar 25 21:51 UTC] No.43260175{5}[source]▶

>>43257415 #

Not sure that vector normalization and “MaxBy” count as a piece of code. It’s a building block, less than a line of code. Usually more compact to just type it out than to describe it in natural language.

replies(1): >>43274449 #

19. SR2Z ◴[05 Mar 25 23:56 UTC] No.43274449{6}[source]▶

>>43260175 #

It really depends - a recent example I had was trying to implement the DPMO paper (a signed distance field to mesh algorithm), and one of the steps is "compute the plane of best fit with these points and project this other point onto it." Not a particularly long piece of code, but long enough that my local DeepSeek model was able to meaningfully save time for me.

http://www.sccg.sk/~chalmo/GM/SM02ob.pdf