/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
AI agent benchmarks are broken
(ddkang.substack.com)
181 points
neehao
| 1 comments |
11 Jul 25 13:06 UTC
|
HN request time: 0.235s
|
source
1.
anupj
◴[
11 Jul 25 13:25 UTC
]
No.
44531868
[source]
▶
>>44531697 (OP)
#
AI agent benchmarks are starting to feel like the self-driving car demos of 2016: impressive until you realize the test track has speed bumps labeled "success"
ID:
GO
↑