←back to thread

AI agent benchmarks are broken

(ddkang.substack.com)
181 points neehao | 1 comments | | HN request time: 0.293s | source
1. rsynnott ◴[] No.44532491[source]
> 45 + 8 = 63

> Pass

Yeah, this generally feels like about the quality one would expect from the industry.