(ddkang.substack.com)

181 points neehao | 1 comments | 11 Jul 25 13:06 UTC | HN request time: 0.239s | source

1. beebmam ◴[11 Jul 25 15:59 UTC] No.44533728[source]▶

I don't think "Benchmarks" are the right way to analyze AI-related processes, which is probably similar to the complexity surrounding human intelligence measurements and how well each human can handle real-world problems.

↑

AI agent benchmarks are broken