I'm absolutely right

(absolutelyright.lol)

Show context

trjordan ◴[05 Sep 25 13:54 UTC] No.45138620[source]▶

OK, so I love this, because we all recognize it.

It's not fully just a tic of language, though. Responses that start off with "You're right!" are alignment mechanisms. The LLM, with its single-token prediction approach, follows up with a suggestion that much more closely follows the user's desires, instead of latching onto it's own previous approach.

The other tic I love is "Actually, that's not right." That happens because once agents finish their tool-calling, they'll do a self-reflection step. That generates the "here's what I did response" or, if it sees an error, the "Actually, ..." change in approach. And again, that message contains a stub of how the approach should change, which allows the subsequent tool calls to actually pull that thread instead of stubbornly sticking to its guns.

The people behind the agents are fighting with the LLM just as much as we are, I'm pretty sure!

replies(11): >>45138772 #>>45138812 #>>45139686 #>>45139852 #>>45140141 #>>45140233 #>>45140703 #>>45140713 #>>45140722 #>>45140723 #>>45141393 #

al_borland ◴[05 Sep 25 16:51 UTC] No.45140722[source]▶

>>45138620 #

In my experience, once it starts telling me I’m right, we’re already going downhill and it rarely gets better from there.

replies(4): >>45141151 #>>45143167 #>>45145334 #>>45146082 #

1. flkiwi ◴[05 Sep 25 17:28 UTC] No.45141151[source]▶

>>45140722 #

Sometimes I just ride the lightning to see how off course it is willing to go. This is not a productive use of my time but it sure is amusing.

In fairness, I’ve done the same thing to overconfident junior colleagues.

replies(3): >>45141631 #>>45142374 #>>45165030 #

2. al_borland ◴[05 Sep 25 18:06 UTC] No.45141631[source]▶

>>45141151 (TP) #

I spent yesterday afternoon doing this. It go to the point where it would acknowledge it was wrong, but would keep giving me the same answer.

It also said it would only try one more time before giving up, but then kept going.

replies(2): >>45142992 #>>45145937 #

3. neilv ◴[05 Sep 25 19:06 UTC] No.45142374[source]▶

>>45141151 (TP) #

They should've called it "riding the lightning":

https://en.wikipedia.org/wiki/Socratic_dialogue

4. dingnuts ◴[05 Sep 25 20:05 UTC] No.45142992[source]▶

>>45141631 #

this happens to me constantly, it's such a huge waste of time. I'm not convinced any of these tools actually save time. It's all a fucking slot machine and Gell-Mann Amnesia and at the end, you often have nothing that works.

I spent like two hours yesterday dicking with aider to make a one line change and it hallucinated an invalid input for the only possible parameter and I wound up using the docs the old fashioned way and doing the task in about two minutes

replies(1): >>45143240 #

5. brianwawok ◴[05 Sep 25 20:27 UTC] No.45143240{3}[source]▶

>>45142992 #

The mistake was using AI for a two minute fix. It totally helps at some tasks. Takes some failures to realize that it does indeed have flaws.

replies(1): >>45143957 #

6. ◴[05 Sep 25 21:40 UTC] No.45143957{4}[source]▶

>>45143240 #

7. CuriouslyC ◴[06 Sep 25 02:04 UTC] No.45145937[source]▶

>>45141631 #

The worst is when it can't do something right, and it does a horrible mock/hack to get it "working." I had a claude fake benchmark data, that pissed me off a bit, though I did make a major architectural improvement to a tool as result (though the real benchmark would have probably made me do it anyhow) so it wasn't all horrible.

8. theshrike79 ◴[08 Sep 25 05:59 UTC] No.45165030[source]▶

>>45141151 (TP) #

It's also educational to play stupid with LLMs

Even if you know exactly where the issue is an it would be a 30 second job to do it manually, you can get a better feel on how to direct it to see what you need to tell it to have it find out the issue and fixing it.

↑