The "confident idiot" problem: Why AI needs hard rules, not vibe checks

(steerlabs.substack.com)

323 points steerlabs | 1 comments | 04 Dec 25 20:48 UTC | HN request time: 0.306s | source

Show context

jqpabc123 ◴[04 Dec 25 21:38 UTC] No.46153440[source]▶

We are trying to fix probability with more probability. That is a losing game.

Thanks for pointing out the elephant in the room with LLMs.

The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.

replies(17): >>46155764 #>>46191721 #>>46191867 #>>46191871 #>>46191893 #>>46191910 #>>46191973 #>>46191987 #>>46192152 #>>46192471 #>>46192526 #>>46192557 #>>46192939 #>>46193456 #>>46194206 #>>46194503 #>>46194518 #

HarHarVeryFunny ◴[08 Dec 25 13:20 UTC] No.46191893[source]▶

>>46153440 #

The factuality problem with LLMs isn't because they are non-deterministic or statistically based, but simply because they operate at the level of words, not facts. They are language models.

You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place. All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C". The LLM wasn't designed to know or care that outputting "B" would represent a lie or hallucination, just that it's a statistically plausible potential next word.

replies(7): >>46192027 #>>46192141 #>>46192198 #>>46192246 #>>46193031 #>>46193526 #>>46194287 #

1. biophysboy ◴[08 Dec 25 16:31 UTC] No.46194287[source]▶

>>46191893 #

I think this is why I get much more utility out of LLMs with writing code. Code can fail if the syntax is wrong; small perturbations in the text (e.g. add a newline instead of a semicolon) can lead to significant increases in the cost function.

Of course, once an LLM is asked to create a bespoke software project for some complex system, this predictability goes away, the trajectory of the tokens succumbs to the intrinsic chaos of code over multi-block length scales, and the result feels more arbitrary and unsatisfying.

I also think this is why the biggest evangelists for LLMs are programmers, while creative writers and journalists are much more dismissive. With human language, the length scale over which tokens can be predicted is much shorter. Even the "laws" of grammar can be twisted or ignored entirely. A writer picks a metaphor because of their individual reading/life experience, not because its the most probable or popular metaphor. This is why LLM writing is so tedious, anodyne, sycophantic, and boring. It sounds like marketing copy because the attention model and RL-HF encourage it.

↑