←back to thread

AI agent benchmarks are broken

(ddkang.substack.com)
181 points neehao | 2 comments | | HN request time: 0.53s | source
1. KTibow ◴[] No.44534575[source]
This is more or less a funnel to their Agentic Benchmark Checklist: https://arxiv.org/abs/2507.02825
replies(1): >>44536725 #
2. nerevarthelame ◴[] No.44536725[source]
Finally, a benchmark for benchmarks. And what's great is that they already benchmarked their benchmark benchmark.

(Apologies for the benchmark snark. I'm glad people are doing this research, thanks for sharing it.)