Most active commenters

williamcotton(7)
namaria(3)

Popular/hot comments

>>43235138 #

←back to thread

Hallucinations in code are the least dangerous form of LLM mistakes

(simonwillison.net)

1. al2o3cr ◴[02 Mar 25 20:51 UTC] No.43234929[source]▶

>>43233903 (OP) #

    My cynical side suspects they may have been looking for
    a reason to dismiss the technology and jumped at the first
    one they found.

MY cynical side suggests the author is an LLM fanboi who prefers not to think that hallucinating easy stuff strongly implies hallucinating harder stuff, and therefore jumps at the first reason to dismiss the criticism.

replies(2): >>43235138 #>>43237917 #

2. williamcotton ◴[02 Mar 25 21:09 UTC] No.43235138[source]▶

>>43234929 (TP) #

What do you mean by "harder stuff"? What about an experimental DSL written in C with a recursive descent parser and a web server runtime that includes Lua, jq, a Postgres connection pool, mustache templates, request-based memory arena, database migrations and much more? 11,000+ lines of code with ~90% written by Claude in Cursor Composer.

https://github.com/williamcotton/webdsl

Frankly us "fanbois" are just a little sick and tired of being told that we must be terrible developers working on simple toys if we find any value from these tools!

replies(6): >>43235291 #>>43235372 #>>43235757 #>>43235877 #>>43239166 #>>43239764 #

3. dzaima ◴[02 Mar 25 21:24 UTC] No.43235291[source]▶

>>43235138 #

Some free code review of the first file I clicked into - https://github.com/williamcotton/webdsl/blob/92762fb724a9035... among other places should probably be doing the conditional "lexer->line++"; thing. Quite a weird decision to force all code paths to manually do that whenever a newline char is encountered. Could've at least made a "advance_maybe_newline(lexer);" or so. But I guess LLMs give you copy-paste garbage.

Even the article of this thread says:

> Just because code looks good and runs without errors doesn’t mean it’s actually doing the right thing.

replies(2): >>43235410 #>>43235622 #

4. elanora96 ◴[02 Mar 25 21:31 UTC] No.43235372[source]▶

>>43235138 #

I'm a strong believer that LLMs are tools and when wielded by talented and experienced developers they are somewhere in the danger category of Stack Overflow and transitive dependencies. This is not a critique of your project, or really the quality of LLMs, but when I see 90% of a 11,000+ loc project written in Claude, it just feels sort of depressing in a way I haven't processed yet.

I love foss, I love browsing projects of all quality levels and vintages and seeing how things were built. I love learning new patterns and sometimes even bickering over their strengths and weaknesses. An LLM generated code base hardly makes me even want to engage with it...

Perhaps these feelings are somewhat analogous to hardcopies vs ebooks? My opinions have changed over time and I read and collect both. Have you had similar thoughts and gotten over them? Do you see tools like Claude in a way where this isn't an issue?

replies(2): >>43235468 #>>43236817 #

5. williamcotton ◴[02 Mar 25 21:35 UTC] No.43235410{3}[source]▶

>>43235291 #

Thanks for taking a look! The lexer and parser is probably close to 100% Claude and I definitely didn't review it completely. I spent most of the time trying out different grammars (normally something you want to do before you start writing code) and runtime features! "Build the web server runtime and framework into the language" was an idea kicking around in my head for a few years but until Cursor I didn't have the energy to play around with the idea.

6. williamcotton ◴[02 Mar 25 21:41 UTC] No.43235468{3}[source]▶

>>43235372 #

I mean, when I'm working on something that I don't expect to be more than a throw-away experiment I'm not too worried about the code itself.

The grammar itself still seems a bit clunky and the next time I head down this path I imagine I'll go with a more hand-crafted approach.

I learned a lot about integrating Lua and jq into a project along the way (and how to make it performant), something I had no prior experience with.

7. ianbutler ◴[02 Mar 25 21:56 UTC] No.43235622{3}[source]▶

>>43235291 #

Okay so this is a personal opinion right? Like where is the objectivity in your review?

What are the hardline performance characteristics being violated? Or functional incorrectness. Is this just "it's against my sensibilities" because at the end of the day frankly no one agrees on how to develop anything.

The thing I see a lot of developers struggle with is just because it doesn't fit your mental model doesn't make it objectively bad.

So unless it's objectively wrong or worse in a measurable characteristic I don't know that it matters.

For the record I'm not asserting it is right, I'm just saying I've seen a lot of critiques of LLM code boil down to "it's not how I'd write it" and I wager that holds for every developer you'll ever interact with.

replies(2): >>43236010 #>>43236155 #

8. Snuggly73 ◴[02 Mar 25 22:11 UTC] No.43235757[source]▶

>>43235138 #

..."request-based memory arena"...

there are some very questionable things going on with the memory handing in this code. just saying.

replies(1): >>43236158 #

9. semi-extrinsic ◴[02 Mar 25 22:23 UTC] No.43235877[source]▶

>>43235138 #

Honest question: this looks like a library others can use to build websites. It contains features related to authentication and security. If it's 90% LLM generated, how do you sleep at night? I'd be dead scared someone would use this, hit a bug that leaks PII (or worse) and then sue me into oblivion.

replies(1): >>43236128 #

10. KoolKat23 ◴[02 Mar 25 22:38 UTC] No.43236010{4}[source]▶

>>43235622 #

I agree, it seems a lot of the complaints boil down to academic reasons.

Fine it's not the best and perhaps may run into some longer term issues but most importantly it works at this point in time.

A snobby/academic equivalent would be someone using an obscure language such as COBOL.

The world continues to turn.

11. williamcotton ◴[02 Mar 25 22:52 UTC] No.43236128{3}[source]▶

>>43235877 #

"WebDSL is an experimental domain-specific language and server implementation for building web applications."

And it's MIT:

  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  SOFTWARE.

12. dzaima ◴[02 Mar 25 22:55 UTC] No.43236155{4}[source]▶

>>43235622 #

OP didn't put much effort into writing the code so I'm certainly not putting in much effort into a proper review of it, for no benefit to me no less. I just wanted to see what quality AI gets you, and made a comment about it.

I'm pretty sure the code not having the "if (…) lexer->line++" in places is just a plain simple repeated bug that'll result in wrong line numbers for certain inputs.

And human-wise I'd say the simple way to not have made that bug would've been to make/change abstractions upon the second or so time writing "if (…) lexer->line++" such that it takes effort to do it incorrectly, whereas the linked code allows getting it wrong by default with no indication that there's a thing to be gotten wrong. Point being that bad abstractions are not just a maintenance nightmare, but also makes doing code review (which is extra important with LLM code) harder.

13. williamcotton ◴[02 Mar 25 22:56 UTC] No.43236158{3}[source]▶

>>43235757 #

Request-based memory arenas are pretty standard for web servers!

replies(1): >>43236510 #

14. Snuggly73 ◴[02 Mar 25 23:38 UTC] No.43236510{4}[source]▶

>>43236158 #

Maybe be, after all - I dont write web servers (btw, the PQ and JQ libraries doesnt seem to use the arena allocator, which makes the whole proposition a bit dubious, but lets say that its me being picky).

What I meant was, that IMO the code is not very robust when dealing with memory allocations:

1. The "string builder" for example silently ignores allocation failures and just happily returns - https://github.com/williamcotton/webdsl/blob/92762fb724a9035...

2. In what seems most of the places, the code simply doesnt check for allocation failures, which leads to overruns (just couple of examples):

https://github.com/williamcotton/webdsl/blob/92762fb724a9035...

replies(1): >>43236644 #

15. williamcotton ◴[02 Mar 25 23:53 UTC] No.43236644{5}[source]▶

>>43236510 #

Thanks for digging in. Yup, those two libs don’t support custom allocators. I raised an issue in the jq repo to ask if they thought about adding it.

Great points about happy path allocations. If I ever touch the project again I’ll check each location.

Note to self: free code reviews of projects if you mention LLMs!

replies(1): >>43239971 #

16. goosejuice ◴[03 Mar 25 00:18 UTC] No.43236817{3}[source]▶

>>43235372 #

You're romanticizing software. To place more value in the code than the outcome. There's nothing wrong with that, but most people that use software don't think about it that way.

17. simonw ◴[03 Mar 25 03:11 UTC] No.43237917[source]▶

>>43234929 (TP) #

I find it a bit surprising that I'm being called an "LLM fanboy" for writing an article with the title "Hallucinations in code are the least dangerous form of LLM mistakes" where the bulk of the article is about how you can't trust LLMs not to make far more serious and hard-to-spot logic errors.

18. rhubarbtree ◴[03 Mar 25 07:26 UTC] No.43239166[source]▶

>>43235138 #

I’m always really sceptical of any “proof by example” that is essentially anecdotal.

If this is going to be your argument, you need a solid scientific approach. A study where N developers are given access to a tool vs N that are not, controls are in place etc.

Because the overwhelming majority of coders I speak to are saying exactly the same thing, which is LLMs are a small productivity boost. And the majority of cursor users, which is admittedly a much smaller number, are saying it just gets stuck playing whack a mole. And common sense says these are the expected outcomes, so we are going to need really rigorous work to convince people that LLMs can build 90% of most deeply technical projects. Exceptional results require exceptional evidence.

And when we do see anecdotal incidents that seem so divergent from the norm, well that then makes you wonder how that can be, is this really objective or are we in some kind of ideological debate?

19. namaria ◴[03 Mar 25 08:55 UTC] No.43239764[source]▶

>>43235138 #

Protip: when you block a user in github it let's you add a note as to why that will show in their profile. It will also alert you when you see a repository to which that user has contributed.

replies(1): >>43241252 #

20. namaria ◴[03 Mar 25 09:28 UTC] No.43239971{6}[source]▶

>>43236644 #

"People took a cursory look at a codebase I published and found glaring mistakes they discussed publicly as examples of how bad it is" is not the flex you think it is.

replies(1): >>43241341 #

21. ◴[03 Mar 25 12:51 UTC] No.43241252{3}[source]▶

>>43239764 #

22. williamcotton ◴[03 Mar 25 13:05 UTC] No.43241341{7}[source]▶

>>43239971 #

"Cursory", get it? I did indeed make it with Cursor! ;)

I hope you find yourself having a better day today than yesterday.

replies(1): >>43243292 #

23. namaria ◴[03 Mar 25 16:20 UTC] No.43243292{8}[source]▶

>>43241341 #

I hope you stop peddling AI slop

↑