/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
AI agent benchmarks are broken
(ddkang.substack.com)
181 points
neehao
| 1 comments |
11 Jul 25 13:06 UTC
|
HN request time: 0.249s
|
source
Show context
KTibow
◴[
11 Jul 25 17:05 UTC
]
No.
44534575
[source]
▶
>>44531697 (OP)
#
This is more or less a funnel to their Agentic Benchmark Checklist:
https://arxiv.org/abs/2507.02825
replies(1):
>>44536725
#
1.
nerevarthelame
◴[
11 Jul 25 20:57 UTC
]
No.
44536725
[source]
▶
>>44534575
#
Finally, a benchmark for benchmarks. And what's great is that they already benchmarked their benchmark benchmark.
(Apologies for the benchmark snark. I'm glad people are doing this research, thanks for sharing it.)
ID:
GO
↑