Thanks for pointing out the elephant in the room with LLMs.
The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.
Thanks for pointing out the elephant in the room with LLMs.
The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.
How much money would make anyone accept to engage in a genocide by direct bribe? The thing is, some people would not see any amount as a convincing one, while some other will do it proactively for no money at all.
Take the idea of the checklist. If you give it to a person and tell them to perform with it, if it's their job they will do so. But with the LLM agents, you can give them the checklist, and maybe they apply it at first, but eventually they completely forget it exists. The longer the conversation goes on without reminding them of the checklist, the more likely they're going to act like the checklist never existed at all. And you can't know when this is, so the best solution we have now is to constantly remind them of the exitance of the checklist.
This is the kind of nondeterminism that make LLMs particularly problematic as tools and a very different proposition from a human, because it's less like working with an expert and more like working with a dementia patient.