/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
AI agent benchmarks are broken
(ddkang.substack.com)
181 points
neehao
| 1 comments |
11 Jul 25 13:06 UTC
|
HN request time: 0.329s
|
source
Show context
xnx
◴[
11 Jul 25 13:33 UTC
]
No.
44531958
[source]
▶
>>44531697 (OP)
#
All benchmarks are flawed. Some benchmarks are useful.
replies(1):
>>44532081
#
yifanl
◴[
11 Jul 25 13:44 UTC
]
No.
44532081
[source]
▶
>>44531958
#
Here's a third sentence fragment: These benchmarks are not.
replies(2):
>>44532272
#
>>44534649
#
suddenlybananas
◴[
11 Jul 25 14:02 UTC
]
No.
44532272
[source]
▶
>>44532081
#
It's nearly a haiku!
replies(1):
>>44533605
#
1.
layer8
◴[
11 Jul 25 15:52 UTC
]
No.
44533605
[source]
▶
>>44532272
#
All benchmarks are flawed. Not all benchmarks are useless. But these benchmarks are.
ID:
GO
↑