Hallucinations in code are the least dangerous form of LLM mistakes

(simonwillison.net)

371 points ulrischa | 5 comments | 02 Mar 25 19:15 UTC | HN request time: 1.058s | source

Show context

sevensor ◴[03 Mar 25 10:13 UTC] No.43240261[source]▶

> you have to put a lot of work in to learn how to get good results out of these systems

That certainly punctures the hype. What are LLMs good for, if the best you can hope for is to spend years learning to prompt it for unreliable results?

replies(3): >>43241184 #>>43241373 #>>43242647 #

1. jjevanoorschot ◴[03 Mar 25 12:40 UTC] No.43241184[source]▶

>>43240261 #

Many tools that increase your productivity as a developer take a while to master. For example, it takes a while to become proficient with a debugger, but I'd still wager that it's worth it to learn to use a debugger over just relying on print debugging.

replies(2): >>43241714 #>>43244534 #

2. MrMcCall ◴[03 Mar 25 13:53 UTC] No.43241714[source]▶

>>43241184 (TP) #

40+ years of successful coding with only print debugging FTW!

A tool that helps you by iteratively guessing the next token is not a "developer tool" any more than a slot machine is a wealth buidling tool.

Even when I was using Visual Studio Ultimate (that has a fantastic step-through debugging environment), the debugger was only useful for the very initial tests, in order to correct dumb mistakes.

Finding dumb mistakes is a different order of magnitude of the dev process than building a complex edifice of working code.

replies(1): >>43242811 #

3. UncleEntity ◴[03 Mar 25 15:42 UTC] No.43242811[source]▶

>>43241714 #

I would say printf debugging is the functional equivalent of "guessing the next token". I only reach for it when my deductive reasoning (and gdb) skills fail and I'm just shining a flashlight in the dark hoping to see the bugs scurrying around.

Ironically, I used it to help the robots find a pretty deep bug in some code they authored in which the whole "this code isn't working, fix it" prompt didn't gain any traction. Giving them the code with the debug statements and the output set them on the right path. Easy peasy...true, they were responsible for the bug in the first place so I guess the humans who write bug free code have the advantage.

replies(1): >>43244568 #

4. krupan ◴[03 Mar 25 17:48 UTC] No.43244534[source]▶

>>43241184 (TP) #

You missed the part about unreliable results. Never in software engineering have we had to put a lot of effort into a tool that gives unpredictable, unreliable results like LLMs.

5. MrMcCall ◴[03 Mar 25 17:50 UTC] No.43244568{3}[source]▶

>>43242811 #

> I would say printf debugging is the functional equivalent of "guessing the next token".

The output of the code print statments, as the code is iteratively built up from skeleton to ever greater levels of functionality, is analyzed to ensure that things are working properly, in a stepwise fashion. There is no guessing in this whatsoever. It is a logical design progression from minimal functionality to complete implementation.

Standard commercial computers never guess, so that puts constraints on my adding to their intrinsic logical data flows, i.e. I should never be guessing either.

> I guess the humans who write bug free code have the advantage.

We fanatical perfectionists are the only ones who write successful software, though perfection in function is the only perfection that can be attained. Other metrics about, for example, code structure, or implementation environment, or UI design, and the like, are merely ancillary to the functioning of the data flows.

And I need not guess to know this fundamental truth, which is common for all engineering endeavors, though software is the only engineering pursuit (not discipline, yet) where there is only a binary result: either it works perfectly as designed or it doesn't. We don't get to be "off by 0.1mm", unless our design specs say we have some grey area, and I've never seen that in all my years of developing/modifying various n-tiered RDBMS topologies, desktop apps, and even a materials science equipment test data capture system.

I saw the term "fuzzy logic" crop up a few decades ago, but have never had the occasion to use anything like that, though even that is a specific kind of algorithm that will either be implemented precisely or not.

↑