←back to thread

The "confident idiot" problem: Why AI needs hard rules, not vibe checks

(steerlabs.substack.com)

323 points steerlabs | 1 comments | 04 Dec 25 20:48 UTC | HN request time: 0s | source

Show context

keiferski ◴[08 Dec 25 13:46 UTC] No.46192154[source]▶

>>46152838 (OP) #

The thing that bothers me the most about LLMs is how they never seem to understand "the flow" of an actual conversation between humans. When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.

LLMs don't do this. Instead, every question is immediately responded to with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.

replies(37): >>46192230 #>>46192268 #>>46192346 #>>46192427 #>>46192525 #>>46192574 #>>46192631 #>>46192754 #>>46192800 #>>46192900 #>>46193063 #>>46193161 #>>46193374 #>>46193376 #>>46193470 #>>46193656 #>>46193908 #>>46194231 #>>46194299 #>>46194388 #>>46194411 #>>46194483 #>>46194761 #>>46195048 #>>46195085 #>>46195309 #>>46195615 #>>46195656 #>>46195759 #>>46195794 #>>46195918 #>>46195981 #>>46196365 #>>46196372 #>>46196588 #>>46197200 #>>46198030 #

1. chemotaxis ◴[08 Dec 25 19:34 UTC] No.46196588[source]▶

This is not necessarily a fundamental limitation. It's a consequence of a fine-tuning process where human raters decide how "good" an answer is. They're not rating the flow of the conversation, but looking at how complete / comprehensive the answer to a one-shot question looks like. This selects for walls of overconfident text.

Another thing the vendors are selecting for is safety / PR risk. If an LLM answers to a hobby chemistry question in a matter-of-factly way, that's a disastrous PR headline in the making. If they open with several paragraphs of disclaimers or just refuse to answer, that's a win.