/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
AI agent benchmarks are broken
(ddkang.substack.com)
181 points
neehao
| 2 comments |
11 Jul 25 13:06 UTC
|
HN request time: 0.53s
|
source
1.
KTibow
◴[
11 Jul 25 17:05 UTC
]
No.
44534575
[source]
▶
>>44531697 (OP)
#
This is more or less a funnel to their Agentic Benchmark Checklist:
https://arxiv.org/abs/2507.02825
replies(1):
>>44536725
#
ID:
GO
2.
nerevarthelame
◴[
11 Jul 25 20:57 UTC
]
No.
44536725
[source]
▶
>>44534575 (TP)
#
Finally, a benchmark for benchmarks. And what's great is that they already benchmarked their benchmark benchmark.
(Apologies for the benchmark snark. I'm glad people are doing this research, thanks for sharing it.)
↑