←back to thread

421 points sohkamyung | 3 comments | | HN request time: 1.15s | source
1. nopinsight ◴[] No.45670377[source]
Hallucination Leaderboard "This evaluates how often an LLM introduces hallucinations when summarizing a document."

https://github.com/vectara/hallucination-leaderboard

If the figures on this leaderboard are to be trusted, many frontier and near-frontier models are already better than the median white-collar worker in this aspect.

Note: The leaderboard doesn't cover tool calling, to be clear.

replies(1): >>45670507 #
2. whatever1 ◴[] No.45670507[source]
I’ve been reviewing academic papers for decades, and I’ve reviewed thousands of them. I’ve never seen a fake citation. I’ve seen misrepresented sources and cooked data, but never a straight-up fake citation.

So the min max and median are at 0.

replies(1): >>45676326 #
3. nopinsight ◴[] No.45676326[source]
Agreed that current LLMs have low floors despite decently high ceilings.

Note that people who write academic papers are quite far from the median white-collar worker.