I'm absolutely right

(absolutelyright.lol)

Show context

trjordan ◴[05 Sep 25 13:54 UTC] No.45138620[source]▶

OK, so I love this, because we all recognize it.

It's not fully just a tic of language, though. Responses that start off with "You're right!" are alignment mechanisms. The LLM, with its single-token prediction approach, follows up with a suggestion that much more closely follows the user's desires, instead of latching onto it's own previous approach.

The other tic I love is "Actually, that's not right." That happens because once agents finish their tool-calling, they'll do a self-reflection step. That generates the "here's what I did response" or, if it sees an error, the "Actually, ..." change in approach. And again, that message contains a stub of how the approach should change, which allows the subsequent tool calls to actually pull that thread instead of stubbornly sticking to its guns.

The people behind the agents are fighting with the LLM just as much as we are, I'm pretty sure!

replies(11): >>45138772 #>>45138812 #>>45139686 #>>45139852 #>>45140141 #>>45140233 #>>45140703 #>>45140713 #>>45140722 #>>45140723 #>>45141393 #

1. nojs ◴[05 Sep 25 14:10 UTC] No.45138812[source]▶

>>45138620 #

Yeah, I figure this is also why it often says “Ah, I found the problem! Let me check the …”. It hasn’t found the problem, but it’s more likely to continue with the solution if you jam that string in there.

replies(1): >>45140521 #

2. adastra22 ◴[05 Sep 25 16:36 UTC] No.45140521[source]▶

>>45138812 (TP) #

We don’t know how Claude code is internally implemented. I would not be surprised at all if they literally inject that string as an alternative context and then go with the higher probability output, or if RLHF was structured in that way and so it always generates the same text.

replies(2): >>45141578 #>>45142215 #

3. data-ottawa ◴[05 Sep 25 18:02 UTC] No.45141578[source]▶

>>45140521 #

Very likely RLHF, based only on how strongly aligned open models repeatedly reference a "policy" despite there being none in the system prompt.

I would assume that priming the model to add these tokens ends up with better autocomplete as mentioned above.

4. steveklabnik ◴[05 Sep 25 18:52 UTC] No.45142215[source]▶

>>45140521 #

Claude Code is a big pile of minified Typescript, and some people have effectively de-compiled it.

replies(1): >>45142655 #

5. sejje ◴[05 Sep 25 19:30 UTC] No.45142655{3}[source]▶

>>45142215 #

So how does it do it?

replies(1): >>45142736 #

6. steveklabnik ◴[05 Sep 25 19:37 UTC] No.45142736{4}[source]▶

>>45142655 #

I haven't read this particular code, I did some analysis of various prompts it uses, I didn't hear about anything specific like this. Mostly wanted to say "it's at least possible to dig into it if you'd like," not that I had the answer directly.

replies(1): >>45145292 #

7. Aeolun ◴[06 Sep 25 00:22 UTC] No.45145292{5}[source]▶

>>45142736 #

Couldn’t you have claude itself de-minify it?

replies(1): >>45149919 #

8. steveklabnik ◴[06 Sep 25 15:05 UTC] No.45149919{6}[source]▶

>>45145292 #

Maybe. It’s not something I have enough of an interest in to out the time into trying it out.

↑