Most active commenters

The "confident idiot" problem: Why AI needs hard rules, not vibe checks

(steerlabs.substack.com)

Show context

keiferski ◴[08 Dec 25 13:46 UTC] No.46192154[source]▶

The thing that bothers me the most about LLMs is how they never seem to understand "the flow" of an actual conversation between humans. When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.

LLMs don't do this. Instead, every question is immediately responded to with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.

replies(37): >>46192230 #>>46192268 #>>46192346 #>>46192427 #>>46192525 #>>46192574 #>>46192631 #>>46192754 #>>46192800 #>>46192900 #>>46193063 #>>46193161 #>>46193374 #>>46193376 #>>46193470 #>>46193656 #>>46193908 #>>46194231 #>>46194299 #>>46194388 #>>46194411 #>>46194483 #>>46194761 #>>46195048 #>>46195085 #>>46195309 #>>46195615 #>>46195656 #>>46195759 #>>46195794 #>>46195918 #>>46195981 #>>46196365 #>>46196372 #>>46196588 #>>46197200 #>>46198030 #

Archelaos ◴[08 Dec 25 14:21 UTC] No.46192525[source]▶

>>46192154 #

I never expected LLMs to be like an actual conversation between humans. The model is in some respects more capable and in some respects more limited than a human. I mean, one could strive for an exact replica of a human -- but for what purpose? The whole thing is a huge association machine. It is a surealistic inspiration generator for me. This is how it works at the moment, until the next break through ...

replies(3): >>46192637 #>>46192799 #>>46193165 #

1. wongarsu ◴[08 Dec 25 14:45 UTC] No.46192799[source]▶

>>46192525 #

> but for what purpose?

I recently introduced a non-technical person to Claude Code, and this non-human behavior was a big sticking point. They tried to talk to Claude similar as to a human, presenting it one piece of information at a time. With humans this is generally beneficial, and they will either nod for you to continue or ask clarifying questions. With Claude this does not work well, you have to infodump as much as possible in each message

So even from a perspective of "how do we make this automaton into the best tool", a more human-like conversation flow might be beneficial. And that doesn't seem beyond the technological capabilities at all, it's just not what we encourage in today's RLHF

replies(5): >>46193142 #>>46193143 #>>46193180 #>>46193774 #>>46195784 #

2. monerozcash ◴[08 Dec 25 15:14 UTC] No.46193142[source]▶

>>46192799 (TP) #

I haven't tried claude, but Codex manages this fine as long as you prompt it correctly to get started.

A lazy example:

"This goal of this project is to do x. Let's prepare a .md file where we spec out the task. Ask me a bunch of questions, one at a time, to help define the task"

Or you could just ask it to be more conversational, instead of just asking questions. It will do that.

3. falcor84 ◴[08 Dec 25 15:14 UTC] No.46193143[source]▶

>>46192799 (TP) #

I often find myself in these situations where I'm afraid that if I don't finish infodumping everything in a single message, it'll go in the wrong direction. So what I've been doing is switching it back to Plan Mode (even when I don't need a plan as such), just as a way of telling it "Hold still, we're still having a conversation".

replies(2): >>46194367 #>>46195852 #

4. paddleon ◴[08 Dec 25 15:17 UTC] No.46193180[source]▶

>>46192799 (TP) #

also, this is what chat-style interfaces encourage. Anything where the "enter" key sends the message instead of creating a paragraph block is just hell.

I'm prompting Gemini, and I write:

I have the following code, can you help me analyze it? <press return>

but Gemnini is already generating output, usually saying "I'm waiting for you to enter the code"

replies(2): >>46194308 #>>46194489 #

5. HPsquared ◴[08 Dec 25 15:55 UTC] No.46193774[source]▶

>>46192799 (TP) #

I usually do the "drip feed" with ChatGPT, but maybe that's not optimal. Hmm, maybe info dump is a good thing to try.

replies(1): >>46194523 #

6. TheGoddessInari ◴[08 Dec 25 16:32 UTC] No.46194308[source]▶

>>46193180 #

Like many chat-style interfaces, it's typically shift-enter to insert a newline.

replies(1): >>46195102 #

7. rkj93 ◴[08 Dec 25 16:36 UTC] No.46194367[source]▶

>>46193143 #

I do this with cursor ai too. I tell, don't change anything, let me hear out what you plan to fix and what you will change

8. lkbm ◴[08 Dec 25 16:45 UTC] No.46194489[source]▶

>>46193180 #

Yeah, seems like current models might benefit from a more email-like UI, and this'll be more true as they get longer task time horizons.

Maybe we want a smaller model tuned for back and forth to help clarify the "planning doc" email. Makes sense that having it all in a single chat-like interface would create confusion and misbehavior.

9. lkbm ◴[08 Dec 25 16:48 UTC] No.46194523[source]▶

>>46193774 #

There a recent(ish: May 2025) paper about how drip-feeding information is worse than restarting with a revised prompt once you realize details are missing.[0]

[0] https://arxiv.org/abs/2505.06120

replies(1): >>46201780 #

10. bwat49 ◴[08 Dec 25 17:32 UTC] No.46195102{3}[source]▶

>>46194308 #

its so easy to accidentally hit enter though lol, I usually type larger prompts in my notes and copy paste then finished

11. jay_kyburz ◴[08 Dec 25 18:24 UTC] No.46195784[source]▶

>>46192799 (TP) #

I hate when I accidentally hit return halfway through writing my prompt and it gives me two pages of advice about some nonsense half sentence.

12. ◴[08 Dec 25 18:30 UTC] No.46195852[source]▶

>>46193143 #

13. __del__ ◴[09 Dec 25 06:05 UTC] No.46201780{3}[source]▶

>>46194523 #

this has been my casual finding as well. why would i want all that conversational crap in the context window?

↑