(aisnakeoil.substack.com)

340 points agomez314 | 1 comments | 21 Mar 23 13:12 UTC | HN request time: 0.196s | source

Show context

wdefoor ◴[21 Mar 23 14:16 UTC] No.35246425[source]▶

OpenAI didn’t conduct the bar exam study, Casetext and Stanford did (gotta read those footnotes). The questions were from after the knowledge cutoff and passed the contamination check.

replies(2): >>35247112 #>>35250825 #

1. calf ◴[21 Mar 23 18:56 UTC] No.35250825[source]▶

>>35246425 #

The main issue is the inapplicability of a test designed for humans, because a LLM's cognition is very different. Contamination presumes thay the style of tests are applicable.

↑

GPT-4 and professional benchmarks: the wrong answer to the wrong question