←back to thread

340 points agomez314 | 1 comments | | HN request time: 0.211s | source
Show context
wdefoor ◴[] No.35246425[source]
OpenAI didn’t conduct the bar exam study, Casetext and Stanford did (gotta read those footnotes). The questions were from after the knowledge cutoff and passed the contamination check.
replies(2): >>35247112 #>>35250825 #
1. calf ◴[] No.35250825[source]
The main issue is the inapplicability of a test designed for humans, because a LLM's cognition is very different. Contamination presumes thay the style of tests are applicable.