←back to thread

AI agent benchmarks are broken

(ddkang.substack.com)
181 points neehao | 1 comments | | HN request time: 0.329s | source
Show context
xnx ◴[] No.44531958[source]
All benchmarks are flawed. Some benchmarks are useful.
replies(1): >>44532081 #
yifanl ◴[] No.44532081[source]
Here's a third sentence fragment: These benchmarks are not.
replies(2): >>44532272 #>>44534649 #
suddenlybananas ◴[] No.44532272[source]
It's nearly a haiku!
replies(1): >>44533605 #
1. layer8 ◴[] No.44533605[source]

  All benchmarks are flawed.
  Not all benchmarks are useless.
  But these benchmarks are.