←back to thread

I'm absolutely right

(absolutelyright.lol)
648 points yoavfr | 1 comments | | HN request time: 0.369s | source
Show context
trjordan ◴[] No.45138620[source]
OK, so I love this, because we all recognize it.

It's not fully just a tic of language, though. Responses that start off with "You're right!" are alignment mechanisms. The LLM, with its single-token prediction approach, follows up with a suggestion that much more closely follows the user's desires, instead of latching onto it's own previous approach.

The other tic I love is "Actually, that's not right." That happens because once agents finish their tool-calling, they'll do a self-reflection step. That generates the "here's what I did response" or, if it sees an error, the "Actually, ..." change in approach. And again, that message contains a stub of how the approach should change, which allows the subsequent tool calls to actually pull that thread instead of stubbornly sticking to its guns.

The people behind the agents are fighting with the LLM just as much as we are, I'm pretty sure!

replies(11): >>45138772 #>>45138812 #>>45139686 #>>45139852 #>>45140141 #>>45140233 #>>45140703 #>>45140713 #>>45140722 #>>45140723 #>>45141393 #
unshavedyak ◴[] No.45138772[source]
I just wish they could hide these steering tokens in the thinking blurb or some such. Ie mostly hidden from the user. Having it reply to the user that way is quite annoying heh.
replies(1): >>45138996 #
KTibow ◴[] No.45138996[source]
This can still happen even with thinking models as long as the model outputs tokens in a sequence. Only way to fix would be to allow it to restart its response or switch to diffusion.
replies(3): >>45139207 #>>45139829 #>>45140424 #
derefr ◴[] No.45139829[source]
I think this poster is suggesting that, rather than "thinking" (messages emitted for oneself as audience) as a discrete step taken before "responding", the model should be trained to, during the response, tag certain sections with tokens indicating that the following token-stream until the matching tag is meant to be visibility-hidden from the client.

Less "independent work before coming to the meeting", more "mumbling quietly to oneself at the blackboard."

replies(1): >>45140531 #
adastra22 ◴[] No.45140531[source]
Doesn’t need training. Just don’t show it. Can be implemented client side.
replies(2): >>45141254 #>>45141415 #
1. derefr ◴[] No.45141415[source]
Just don't show... what? The specific exact text "You're absolutely right!"?

That heuristic wouldn't even survive the random fluctuations in how the model says it (it doesn't always say "absolutely"; the punctuation it uses is random; etc); let alone speaking to the model in another language, or challenging the model in the context of it roleplaying a character or having been otherwise prompted to use some other personality / manner of speech (where it still does emit this kind of "self-reminder" text, but using different words that cohere with the set personality.)

The point of teaching a model to emit inline <thinking> sequences, would be to allow the model to arbitrarily "mumble" (say things for its own benefit, that it knows would annoy people if spoken aloud), not just to "mumble" this one single thing.

Also, a frontend heuristic implies a specific frontend. I.e. it only applies to hosted-proprietary-model services that have a B2C chat frontend product offering tuned to the needs of their model (i.e. effectively just ChatGPT and Claude.) The text-that-should-be-mumbled wouldn't be tagged in any way if you call the same hosted-proprietary-model service through its API (so nobody building bots/agents on these platforms would benefit from the filtering.)

In contrast, if one of the hosted-proprietary-model chat services trained their model to tag its mumbles somehow in the response stream, then this would define an effective de-facto microformat for such mumbles — allowing any client (agent or frontend) consuming the conversation message stream through the API to have a known rule to pick out and hide arbitrary mumbles from the text (while still being able to make them visible to the user if the user desires, unlike if they were filtered out at the "business layer" [inference-host framework] level.)

And if general-purpose frameworks and clients began supporting that microformat, then other hosted-proprietary-model services — and orgs training open models — would see that the general-purpose frameworks/clients have this support, and so would seek to be compatible with that support, basically by aping the format the first mumbling hosted-proprietary-model emits.

(This is, in fact, exactly what already happened for the de-facto microformat that is OpenAI's reasoning-model explicit pre-response-message thinking-message format, i.e. the {"content_type": "thoughts", "thoughts": [{"summary": "...", "content": "..."}]} format.)