(www.bbc.co.uk)

421 points sohkamyung | 2 comments | 22 Oct 25 13:39 UTC | HN request time: 0.542s | source

Show context

nopinsight ◴[22 Oct 25 15:11 UTC] No.45670377[source]▶

Hallucination Leaderboard "This evaluates how often an LLM introduces hallucinations when summarizing a document."

https://github.com/vectara/hallucination-leaderboard

If the figures on this leaderboard are to be trusted, many frontier and near-frontier models are already better than the median white-collar worker in this aspect.

Note: The leaderboard doesn't cover tool calling, to be clear.

replies(1): >>45670507 #

1. whatever1 ◴[22 Oct 25 15:20 UTC] No.45670507[source]▶

>>45670377 #

I’ve been reviewing academic papers for decades, and I’ve reviewed thousands of them. I’ve never seen a fake citation. I’ve seen misrepresented sources and cooked data, but never a straight-up fake citation.

So the min max and median are at 0.

replies(1): >>45676326 #

2. nopinsight ◴[22 Oct 25 23:11 UTC] No.45676326[source]▶

>>45670507 (TP) #

Agreed that current LLMs have low floors despite decently high ceilings.

Note that people who write academic papers are quite far from the median white-collar worker.

↑

AI assistants misrepresent news content 45% of the time