←back to thread

323 points steerlabs | 2 comments | | HN request time: 0.044s | source
Show context
jqpabc123 ◴[] No.46153440[source]
We are trying to fix probability with more probability. That is a losing game.

Thanks for pointing out the elephant in the room with LLMs.

The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.

replies(17): >>46155764 #>>46191721 #>>46191867 #>>46191871 #>>46191893 #>>46191910 #>>46191973 #>>46191987 #>>46192152 #>>46192471 #>>46192526 #>>46192557 #>>46192939 #>>46193456 #>>46194206 #>>46194503 #>>46194518 #
steerlabs ◴[] No.46155764[source]
Exactly. We treat them like databases, but they are hallucination machines.

My thesis isn't that we can stop the hallucinating (non-determinism), but that we can bound it.

If we wrap the generation in hard assertions (e.g., assert response.price > 0), we turn 'probability' into 'manageable software engineering.' The generation remains probabilistic, but the acceptance criteria becomes binary and deterministic.

replies(4): >>46163076 #>>46191658 #>>46191774 #>>46191967 #
squidbeak ◴[] No.46191774[source]
I don't agree that users see them as databases. Sure there are those who expect LLMs to be infallible and punish the technology when it disappoints them, but it seems to me that the overwhelmingly majority quickly learn what AI's shortcomings are, and treat them instead like intelligent entities who will sometimes make mistakes.
replies(2): >>46191785 #>>46191917 #
philipallstar ◴[] No.46191785[source]
> but it seems to me that the overwhelmingly majority

The overwhelming majority of what?

replies(1): >>46192444 #
antonvs ◴[] No.46192444[source]
Of users. It's an implicit subject from the first sentence.
replies(1): >>46194222 #
1. philipallstar ◴[] No.46194222[source]
But how do they know that, if it's of all users?
replies(1): >>46195062 #
2. antonvs ◴[] No.46195062[source]
They didn't claim to know it, they said "it seems to me". Presumably they're extrapolating from their experience, or their expectations of how an average user would behave.