While I get the academic perspective of sharing these insights, this article comes across as corporate justifying/complaining that their model's score is lower than it should be on the leaderboards... by saying the leaderboards are wrong.
Or an even darker take is that its coorporate saying they won't prioritize eliminating hallucinations until the leaderboards reward it.
replies(1):