Hallucinations in code are the least dangerous form of LLM mistakes

(simonwillison.net)

371 points ulrischa | 2 comments | 02 Mar 25 19:15 UTC | HN request time: 0.431s | source

Show context

verbify ◴[03 Mar 25 02:51 UTC] No.43237785[source]▶

An anecdote: I was working for a medical centre, and had some code that was supposed to find the 'main' clinic of a patient.

The specification was to only look at clinical appointments, and find the most recent appointment. However if the patient didn't have a clinical appointment, it was supposed to find the most recent appointment of any sort.

I wrote the code by sorting the data (first by clinical-non-clinical and then by date). I asked chatgpt to document it. It misunderstood the code and got the sorting backwards.

I was pretty surprised, and after testing with foo-bar examples eventually realised that I had called the clinical-non-clinical column "Clinical", which confused the LLM.

This is the kind of mistake that is a lot worse than "code doesn't run" - being seemingly right but wrong is much worse than being obviously wrong.

replies(1): >>43238787 #

1. zahlman ◴[03 Mar 25 06:04 UTC] No.43238787[source]▶

>>43237785 #

To be clear, by "clinical-non-clinical", you mean a boolean flag for whether the appointment is clinical?

replies(1): >>43243250 #

2. verbify ◴[03 Mar 25 16:17 UTC] No.43243250[source]▶

>>43238787 (TP) #

Yes, although we weren't using a boolean.

(There was a reason for this - the field was used elsewhere within a PowerBI model, and the clinicians couldn't get their heads around True/False, PowerBI doesn't have an easy way to map True/False values to strings, so we used 'Clinical/Non-Clinical' as string values).

I am reluctant to share the code example, because I'm preciously guarding an example of an LLM making an error in the hope that I'll be able to benchmark models using this, however here's the powerquery code (which you can put into excel) - ask an LLM to explain the code/predict what the output will look like, and compare it with what you get in excel.

let

    MyTable = #table(

        {"Foo"},

        {

            {"ABC"},

            {"BCD"},

            {"CDE"}

        }

    ),

    AddedCustom = Table.AddColumn(

        MyTable,

        "B",

        each if Text.StartsWith([Foo], "LIAS") or Text.StartsWith([Foo], "B") 

             then "B" 

             else "NotB"

    ),

    SortedRows = Table.Sort(

        AddedCustom, 

        {{"B", Order.Descending}}

    )

    SortedRows

I believe the issue arises because the column that sorts B/NotB is also called 'B' (i.e. the Clinical/Non-Clinical column was simply called 'Clinical', which is not an amazing naming convention).

↑