←back to thread

AI agent benchmarks are broken

(ddkang.substack.com)
181 points neehao | 1 comments | | HN request time: 0.249s | source
Show context
KTibow ◴[] No.44534575[source]
This is more or less a funnel to their Agentic Benchmark Checklist: https://arxiv.org/abs/2507.02825
replies(1): >>44536725 #
1. nerevarthelame ◴[] No.44536725[source]
Finally, a benchmark for benchmarks. And what's great is that they already benchmarked their benchmark benchmark.

(Apologies for the benchmark snark. I'm glad people are doing this research, thanks for sharing it.)