←back to thread

I'm absolutely right

(absolutelyright.lol)
648 points yoavfr | 8 comments | | HN request time: 1.404s | source | bottom
Show context
trjordan ◴[] No.45138620[source]
OK, so I love this, because we all recognize it.

It's not fully just a tic of language, though. Responses that start off with "You're right!" are alignment mechanisms. The LLM, with its single-token prediction approach, follows up with a suggestion that much more closely follows the user's desires, instead of latching onto it's own previous approach.

The other tic I love is "Actually, that's not right." That happens because once agents finish their tool-calling, they'll do a self-reflection step. That generates the "here's what I did response" or, if it sees an error, the "Actually, ..." change in approach. And again, that message contains a stub of how the approach should change, which allows the subsequent tool calls to actually pull that thread instead of stubbornly sticking to its guns.

The people behind the agents are fighting with the LLM just as much as we are, I'm pretty sure!

replies(11): >>45138772 #>>45138812 #>>45139686 #>>45139852 #>>45140141 #>>45140233 #>>45140703 #>>45140713 #>>45140722 #>>45140723 #>>45141393 #
1. nojs ◴[] No.45138812[source]
Yeah, I figure this is also why it often says “Ah, I found the problem! Let me check the …”. It hasn’t found the problem, but it’s more likely to continue with the solution if you jam that string in there.
replies(1): >>45140521 #
2. adastra22 ◴[] No.45140521[source]
We don’t know how Claude code is internally implemented. I would not be surprised at all if they literally inject that string as an alternative context and then go with the higher probability output, or if RLHF was structured in that way and so it always generates the same text.
replies(2): >>45141578 #>>45142215 #
3. data-ottawa ◴[] No.45141578[source]
Very likely RLHF, based only on how strongly aligned open models repeatedly reference a "policy" despite there being none in the system prompt.

I would assume that priming the model to add these tokens ends up with better autocomplete as mentioned above.

4. steveklabnik ◴[] No.45142215[source]
Claude Code is a big pile of minified Typescript, and some people have effectively de-compiled it.
replies(1): >>45142655 #
5. sejje ◴[] No.45142655{3}[source]
So how does it do it?
replies(1): >>45142736 #
6. steveklabnik ◴[] No.45142736{4}[source]
I haven't read this particular code, I did some analysis of various prompts it uses, I didn't hear about anything specific like this. Mostly wanted to say "it's at least possible to dig into it if you'd like," not that I had the answer directly.
replies(1): >>45145292 #
7. Aeolun ◴[] No.45145292{5}[source]
Couldn’t you have claude itself de-minify it?
replies(1): >>45149919 #
8. steveklabnik ◴[] No.45149919{6}[source]
Maybe. It’s not something I have enough of an interest in to out the time into trying it out.