Hallucinations in code are the least dangerous form of LLM mistakes

(simonwillison.net)

371 points ulrischa | 3 comments | 02 Mar 25 19:15 UTC | HN request time: 0.603s | source

Show context

Terr_ ◴[03 Mar 25 03:34 UTC] No.43238043[source]▶

[Recycled from an older dupe submission]

As much as I've agreed with the author's other posts/takes, I find myself resisting this one:

> I'll finish this rant with a related observation: I keep seeing people say “if I have to review every line of code an LLM writes, it would have been faster to write it myself!”

> Those people are loudly declaring that they have under-invested in the crucial skills of reading, understanding and reviewing code written by other people.

No, that does not follow.

1. Reviewing depends on what you know about the expertise (and trust) of the person writing it. Spending most of your day reviewing code written by familiar human co-workers is very different from the same time reviewing anonymous contributions.

2. Reviews are not just about the code's potential mechanics, but inferring and comparing the intent and approach of the writer. For LLMs, that ranges between non-existent and schizoid, and writing it yourself skips that cost.

3. Motivation is important, for some developers that means learning, understanding and creating. Not wanting to do code reviews all day doesn't mean you're bad at them. Also, reviewing an LLM's code has no social aspect.

However you do it, somebody else should still be reviewing the change afterwards.

replies(6): >>43240863 #>>43241052 #>>43241581 #>>43243540 #>>43243749 #>>43244380 #

elcritch ◴[03 Mar 25 12:19 UTC] No.43241052[source]▶

>>43238043 #

> 2. Reviews are not just about the code's potential mechanics, but inferring and comparing the intent and approach of the writer. For LLMs, that ranges between non-existent and schizoid, and writing it yourself skips that cost.

With humans you can be reasonably sure they've followed through with a mostly consistent level of care and thouhht. LLMs will just outright lie to make their jobs easier in one section while in another area generate high quality code.

I've had to do a 'git reset --hard' after trying out the Claude code and spending $20 bucks. It always seems great at first, but it just becomes non-sense on larger changes. Maybe chain of thought models do better though.

replies(3): >>43241485 #>>43242421 #>>43246504 #

aaronbaugher ◴[03 Mar 25 15:06 UTC] No.43242421[source]▶

>>43241052 #

It's like cutting and pasting from Stack Overflow, if SO didn't have a voting system to give you some hope that the top answer at least works and wasn't hallucinated by someone who didn't understand the question.

I asked Gemini for the lyrics of a song that I knew was on all the main lyrics sites. It gave me the lyrics to a different song with the same title. On the second try, it hallucinated a batch of lyrics. Third time, I gave it a link to the correct lyrics, and it "lied" and said it had consulted that page to get it right but gave me another wrong set.

It did manage to find me a decent recipe for chicken salad, but I certainly didn't make it without checking to make sure the ingredients and ratios looked reasonable. I wouldn't use code from one of these things without closely inspecting every line, which makes it a pointless exercise.

replies(1): >>43242540 #

1. simonw ◴[03 Mar 25 15:17 UTC] No.43242540[source]▶

>>43242421 #

I'm pretty sure Gemini (and likely other models too) have been deliberately engineered to avoid outputting exact lyrics, because the LLM labs know that the music industry is extremely litigious.

I'm surprised it didn't outright reject your request to be honest.

replies(1): >>43244362 #

2. aaronbaugher ◴[03 Mar 25 17:33 UTC] No.43244362[source]▶

>>43242540 (TP) #

I wondered if it'd been banned from looking at those sites. If that's commonly known (I've only started dabbling in this stuff, so I wasn't aware of that), it's interesting that it didn't just tell me it couldn't do that, instead of lying and giving false info.

replies(1): >>43244461 #

3. krupan ◴[03 Mar 25 17:42 UTC] No.43244461[source]▶

>>43244362 #

"it's interesting that it didn't just tell me it couldn't do that, instead of lying and giving false info."

Interesting is a very kind word to use there

↑