←back to thread

Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)
394 points tomduncalf | 2 comments | | HN request time: 0.42s | source
Show context
asperous ◴[] No.40712326[source]
Having tons of people employ human ingenuity to manipulate existing LLMs into passing this one benchmark kind of defeats the purpose of testing for "AGI". The author points this out as it's more of a pattern matching test.

Though on the other hand figuring out which manipulations are effective does teach us something. And I think most problems boil down to pattern matching, creating a true, easily testable AGI test may be tough.

replies(5): >>40712503 #>>40712555 #>>40712632 #>>40713120 #>>40713156 #
1. janalsncm ◴[] No.40712503[source]
Perhaps if we don’t know how to create an evaluation that can’t be “gamed” it tells us something about how special our intelligence really is?
replies(1): >>40715393 #
2. lucianbr ◴[] No.40715393[source]
I don't know how to create a liver, or test one, so what does that say about my liver? Pretty much nothing.