(aisnakeoil.substack.com)

340 points agomez314 | 1 comments | 21 Mar 23 13:12 UTC | HN request time: 0.638s | source

1. sebzim4500 ◴[21 Mar 23 14:16 UTC] No.35246436[source]▶

Clearly contaminated benchmarks are not very useful, but I do not understand the assertion that we should care about "Qualitative studies of professionals using AI" over "Comparison on real world tasks". I've looked through these benchmarks in details, and I've come to the conclusion that real world performance is all that matters. Everything else is either incredibly subjective or designed to beat a particular prior model.

↑

GPT-4 and professional benchmarks: the wrong answer to the wrong question