Getting 50% (SoTA) on Arc-AGI with GPT-4o

>> Claim 1 seems likely true to me for a reasonable notion of “learning”. I think François Chollet agrees here. Most of my doubts about this claim are concerns that you can basically brute force ARC-AGI without interestingly doing learning (e.g. brute-force search over some sort of DSL or training on a huge array of very similar problems). These concerns apply much less to the kind of approach I used

The approach described in the article is exactly "brute-force search over some sort of DSL". The "DSL" is a model of Python syntax that GPT-4o has learned after training on the entire internet. This "DSL" is locked up in the black box of GPT-4o's weights, but just because no-one can see it, it doesn't mean it's not there; and we can see GPT-4o generating Python programs, so we know it is there, even if we don't know what it looks like.

That DSL may not be "domain specific" in the sense of being specifically tailored to solve ARC-AGI tasks, or any other particular task, but it is "domain specific" in the sense of generating Python programs for some subset of all possible Python programs that includes programs that can solve some ARC-AGI tasks. That's a very broad category, but that's why it over-generates so much: it needs to draw 8k samples total until one works for just 50% of the public eval set.