Hallucinations in code are the least dangerous form of LLM mistakes

> With code you get a powerful form of fact checking for free. Run the code, see if it works.

Um. No.

This is oversimplification that falls apart in any at minimum level system.

Over my career I’ve encountered plenty of reliability caused consequences. Code that would run but side effects of not processing something, processing it too slow or processing it twice would have serious consequences - financial and personal ones.

And those weren’t „nuclear power plant management” kind of critical. I often reminisce about educational game that was used at school and cost of losing a single save progress meant couple thousand dollars of reimbursement.

https://xlii.space/blog/network-scenarios/

This a cheatsheet I made for my colleagues. This is the thing we need to keep in mind when designing system I’m working on. Rarely any LLM thinks about it. It’s not a popular engineering by any sort, but it it’s here.

As for today I’ve yet to name single instance where any of ChatGPT produced code actually would save me time. I’ve seen macro generation code recommendation for Go (Go doesnt have macros), object mutations for Elixir (Elixir doesn’t have objects but immutable structs), list splicing in Fennel (Fennel doesn’t have splicing), language feature pragma ported from another or pure byte representation of memory in Rust and the code used UTF-8 string parsing to do it. My trust toward any non-ephemeral generated code is sub zero.

It’s exhausting and annoying. It feels like interacting with Calvin’s (of Calvin and Hobbes) dad but with all the humor taken away.