←back to thread

323 points steerlabs | 3 comments | | HN request time: 0.85s | source
Show context
keiferski ◴[] No.46192154[source]
The thing that bothers me the most about LLMs is how they never seem to understand "the flow" of an actual conversation between humans. When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.

LLMs don't do this. Instead, every question is immediately responded to with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.

replies(37): >>46192230 #>>46192268 #>>46192346 #>>46192427 #>>46192525 #>>46192574 #>>46192631 #>>46192754 #>>46192800 #>>46192900 #>>46193063 #>>46193161 #>>46193374 #>>46193376 #>>46193470 #>>46193656 #>>46193908 #>>46194231 #>>46194299 #>>46194388 #>>46194411 #>>46194483 #>>46194761 #>>46195048 #>>46195085 #>>46195309 #>>46195615 #>>46195656 #>>46195759 #>>46195794 #>>46195918 #>>46195981 #>>46196365 #>>46196372 #>>46196588 #>>46197200 #>>46198030 #
1. vidarh ◴[] No.46195794[source]
A lot of this, I suspect, on the basis of having worked on a supervised fine-tuning project for one of the largest companies in this space, is that providers have invested a lot of money in fine-tuning datasets that sound this way.

On the project I did work on, reviewers were not allowed to e.g. answer that they didn't know - they had to provide an answer to every prompt provided. And so when auditing responses, a lot of difficult questions had "confidently wrong" answers because the reviewer tried and failed, or all kinds of evasive workarounds because they knew they couldn't answer.

Presumbly these providers will eventually understand (hopefully already has - this was a year ago) that they also need to train the models to understand when the correct answer is "I don't know", or "I'm not sure. I think maybe X, but ..."

replies(1): >>46196736 #
2. ActorNightly ◴[] No.46196736[source]
Its not the training/tuning, its pretty much the nature of llms. The whole idea is to give a best quess of the token. The more complex dynamics behind the meaning of the words and how those words relate to real world concepts isn't learned.
replies(1): >>46197264 #
3. vidarh ◴[] No.46197264[source]
You're not making any sense. The best guess will often be refusals if they see enough of that in the training data, so of course it is down to training

And I literally saw the effect of this first hand, in seeing how the project I worked on was actively part of training this behaviour into a major model.

As for your assertion they don't learn the more complex dynamics, that was trite and not true already several years ago.