Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)

394 points tomduncalf | 1 comments | 17 Jun 24 21:51 UTC | HN request time: 0.207s | source

Show context

extr ◴[17 Jun 24 22:42 UTC] No.40712008[source]▶

Very cool. When GPT-4 first came out I tried some very naive approaches using JSON representations on the puzzles [0], [1]. GPT-4 did "okay", but in some cases it felt like it was falling for the classic LLM issue of saying all the right things but then then failing to grasp some critical bit of logic and missing the solution entirely.

At the time I noticed that many of the ARC problems rely on visual-spatial priors that are "obvious" when viewing the grids, but become less so when transmuted to some other representation. Many of them rely on some kind of symmetry, counting, or the very human bias to assume a velocity or continued movement when seeing particular patterns.

I had always thought maybe multimodality was key: the model needs to have similar priors around grounded physical spaces and movement to be able to do well. I'm not sure the OP really fleshes this line of thinking out, brute forcing python solutions is a very "non human" approach.

[0] https://x.com/eatpraydiehard/status/1632671307254099968

[1] https://x.com/eatpraydiehard/status/1632683214329479169

replies(2): >>40712644 #>>40716335 #

YeGoblynQueenne ◴[18 Jun 24 11:11 UTC] No.40716335[source]▶

>>40712008 #

>> GPT-4 did "okay", but in some cases it felt like it was falling for the classic LLM issue of saying all the right things but then then failing to grasp some critical bit of logic and missing the solution entirely.

It still is. It misses the solution so comprehensively that it needs an outer loop to figure out which one is the solution out of 8k programs GPT-4o generates.

replies(1): >>40717373 #

ealexhudson ◴[18 Jun 24 13:05 UTC] No.40717373[source]▶

>>40716335 #

We don't really know what GPT-4 "is". I remember reading a number of relatively well-informed suggestions that there are a number of a models inside there, and the API being interacted with is some form of outer-loop around them.

I don't think the location of the outer-loop or the design of it really makes much difference. There is no flock of birds without the individuals, the flock itself doesn't really exist as a tangible thing, but what arises out of the collective adjustments between all these individuals gives rise to a flock. Similarly, we may find groups of LLMs and various outer control loops give rise to an emergent phenomena much greater than the sum of their parts.

replies(1): >>40717527 #

1. YeGoblynQueenne ◴[18 Jun 24 13:20 UTC] No.40717527[source]▶

>>40717373 #

>> We don't really know what GPT-4 "is".

Yes, we do. It's a language model.

↑