←back to thread

277 points simianwords | 1 comments | | HN request time: 0.213s | source
1. sp1982 ◴[] No.45153152[source]
This makes sense. I recently did an experiment to test GPT5 on hallucinations on cricket data where there is a lot of statistical pressure. It is far better to say idk than a wrong answer. Most current benchmarks don’t test for that. https://kaamvaam.com/machine-learning-ai/llm-eval-hallucinat...