←back to thread

Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)
394 points tomduncalf | 2 comments | | HN request time: 0.017s | source
Show context
asperous ◴[] No.40712326[source]
Having tons of people employ human ingenuity to manipulate existing LLMs into passing this one benchmark kind of defeats the purpose of testing for "AGI". The author points this out as it's more of a pattern matching test.

Though on the other hand figuring out which manipulations are effective does teach us something. And I think most problems boil down to pattern matching, creating a true, easily testable AGI test may be tough.

replies(5): >>40712503 #>>40712555 #>>40712632 #>>40713120 #>>40713156 #
opdahl ◴[] No.40712555[source]
Wouldn’t the real AGI test be that an AI would be able to do what the author did here and write this blog post?
replies(2): >>40712730 #>>40716667 #
atroche ◴[] No.40712730[source]
Yep, but a float is more useful than a bool for tracking progress, especially if you want to answer questions like "how soon can we expect (drivers/customer support staff/programmers) to lose their jobs?"

Hard to find the right float but worth trying I think.

replies(1): >>40713241 #
1. opdahl ◴[] No.40713241[source]
I agree, but it does seem a bit strange that you are allowed to "custom-fit" an AI program to solve a specific benchmark. Shouldn't there be some sort of rule that for something to be AGI it should work as "off-the-shelf" as possible?
replies(1): >>40713415 #
2. soist ◴[] No.40713415[source]
If OpenAI had an embedded python interpreter or for that matter an interpreter for lambda calculus or some other equally universal Turing machine then this approach would work but there are no LLMs with embedded symbolic interpreters. LLMs currently are essentially probability distributions based on a training corpus and do not have any symbolic reasoning capabilities. There is no backtracking, for example, like in Prolog.