The "confident idiot" problem: Why AI needs hard rules, not vibe checks

1. steerlabs ◴[05 Dec 25 01:23 UTC] No.46155755[source]▶

OP here. I wrote this because I got tired of agents confidently guessing answers when they should have asked for clarification (e.g. guessing "Springfield, IL" instead of asking "Which state?" when asked "weather in Springfield").

I built an open-source library to enforce these logic/safety rules outside the model loop: https://github.com/imtt-dev/steer

replies(3): >>46191943 #>>46191953 #>>46196578 #

2. condiment ◴[08 Dec 25 13:24 UTC] No.46191943[source]▶

>>46155755 (TP) #

This approach kind of reminds me of taking an open-book test. Performing mandatory verification against a ground truth is like taking the test, then going back to your answers and looking up whether they match.

Unlike a student, the LLM never arrives at a sort of epistemic coherence, where they know what they know, how they know it, and how true it's likely to be. So you have to structure every problem into a format where the response can be evaluated against an external source of truth.

3. amorroxic ◴[08 Dec 25 13:25 UTC] No.46191953[source]▶

>>46155755 (TP) #

Thanks a lot for this. Also one question in case anyone could shed a bit of light: my understanding is that setting temperature=0, top_p=1 would cause deterministic output (identical output given identical input). For sure it won’t prevent factually wrong replies/hallucination, only maintains generation consistency (eq. classification tasks). Is this universally correct or is it dependent on model used? (or downright wrong understanding of course?)

replies(1): >>46196141 #

4. antonvs ◴[08 Dec 25 18:53 UTC] No.46196141[source]▶

>>46191953 #

> my understanding is that setting temperature=0, top_p=1 would cause deterministic output (identical output given identical input).

That's typically correct. Many models are implemented this way deliberately. I believe it's true of most or all of the major models.

> Is this universally correct or is it dependent on model used?

There are implementation details that lead to uncontrollable non-determinism if they're not prevented within the model implementation. See e.g. the Pytorch docs for CUDA convolution determinism: https://docs.pytorch.org/docs/stable/notes/randomness.html#c...

That documents settings like this:

    torch.backends.cudnn.deterministic = True

Parallelism can be a source of non-determinism if it's not controlled for, either implicitly via e.g. dependencies or explicitly.

5. janalsncm ◴[08 Dec 25 19:33 UTC] No.46196578[source]▶

>>46155755 (TP) #

You should use structured output rather than checking and rechecking for valid json. It can’t solve all of your problems but it can enforce a schema on the output format.