/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
AI agent benchmarks are broken
(ddkang.substack.com)
181 points
neehao
| 1 comments |
11 Jul 25 13:06 UTC
|
HN request time: 0.875s
|
source
1.
neehao
◴[
11 Jul 25 16:32 UTC
]
No.
44534145
[source]
▶
>>44531697 (OP)
#
And I would say, often we need effortful labels by groups of humans:
https://www.gojiberries.io/superhuman-level-performance/
ID:
GO
↑