Hallucination Leaderboard
"This evaluates how often an LLM introduces hallucinations when summarizing a document."
https://github.com/vectara/hallucination-leaderboard
If the figures on this leaderboard are to be trusted, many frontier and near-frontier models are already better than the median white-collar worker in this aspect.
Note: The leaderboard doesn't cover tool calling, to be clear.
replies(1):