Hallucinations in code are the least dangerous form of LLM mistakes

(simonwillison.net)

371 points ulrischa | 2 comments | 02 Mar 25 19:15 UTC | HN request time: 0.475s | source

Show context

tigerlily ◴[03 Mar 25 01:25 UTC] No.43237317[source]▶

When you go from the adze to the chainsaw, be mindful that you still need to sharpen the chainsaw, top up the chain bar oil, and wear chaps.

Edit: oh and steel capped boots.

Edit 2: and a face shield and ear defenders. I'm all tuckered out like Grover in his own alphabet.

replies(2): >>43237521 #>>43239622 #

1. Telemakhos ◴[03 Mar 25 02:02 UTC] No.43237521[source]▶

>>43237317 #

I'm not remotely convinced that LLMs are a chainsaw, unless they've been very thoroughly trained on the problem domain. LLMs are good for vibe coding, and some of them (Grok 3 is actually good at this) can speak passable Latin, but try getting them to compose Sotadean verse in Latin or put a penthemimeral caesura in an iambic trimeter in ancient Greek. They can define a penthemimeral caesura and an iambic trimeter, but they don't understand the concepts and can't apply one to the other. All they can do is spit out the next probable token. Worse, LLMs have lied to me on the definition of Sotadean verse, not even regurgitating what Wikipedia should have taught them.

Image-generating AIs are really good at producing passable human forms, but they'll fail at generating anything realistic for dice, even though dice are just cubes with marks on them. Ask them to illustrate the Platonic solids, which you can find well-illustrated with a Google image search, and you'll get a bunch of lumps, some of which might resemble shapes. They don't understand the concepts: they just work off probability. But, they look fairly good at those probabilities in domains like human forms, because they've been specially trained on them.

LLMs seem amazing in a relatively small number of problem domains over which they've been extensively trained, and they seem amazing because they have been well trained in them. When you ask for something outside those domains, their failure to work from inductions about reality (like "dice are a species of cubes, but differentiated from other cubes by having dots on them") or to be able to apply concepts become patent, and the chainsaw looks a lot like an adze that you spend more time correcting than getting correct results from.

replies(1): >>43242869 #

2. aaronbaugher ◴[03 Mar 25 15:48 UTC] No.43242869[source]▶

>>43237521 (TP) #

When I was tutoring algebra, I sometimes ran into students who could solve the problems in the book, but if I wrote a problem that looked a little different or that combined two of the concepts they'd supposedly learned, they would be lost. I gradually realized that they didn't understand the concepts at all, but had learned to follow patterns. ("When it's one fraction divided by another fraction, flip the second fraction over and multiply. Why? No idea, but I get an A.")

This feels like that: a "student" who can produce the right answers as long as you stick to a certain set of questions that he's already been trained on through repetition, but anything outside that set is hopeless, even if someone who understood that set could easily reason from it to the new question.

↑