←back to thread

AI agent benchmarks are broken

(ddkang.substack.com)
181 points neehao | 1 comments | | HN request time: 0.875s | source
1. neehao ◴[] No.44534145[source]
And I would say, often we need effortful labels by groups of humans: https://www.gojiberries.io/superhuman-level-performance/