Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)

394 points tomduncalf | 3 comments | 17 Jun 24 21:51 UTC | HN request time: 0.611s | source

Show context

eigenvalue ◴[17 Jun 24 23:00 UTC] No.40712174[source]▶

The Arc stuff just felt intuitively wrong as soon as I heard it. I don't find any of Chollet's critiques of LLMs to be convincing. It's almost as if he's being overly negative about them to make a point or something to push back against all the unbridled optimism. The problem is, the optimism really seems to be justified, and the rate of improvement of LLMs in the past 12 months has been nothing short of astonishing.

So it's not at all surprising to me to see Arc already being mostly solved using existing models, just with different prompting techniques and some tool usage. At some point, the naysayers about LLMs are going to have to confront the problem that, if they are right about LLMs not really thinking/understanding/being sentient, then a very large percentage of people living today are also not thinking/understanding/sentient!

replies(11): >>40712233 #>>40712290 #>>40712304 #>>40712352 #>>40712385 #>>40712431 #>>40712465 #>>40712713 #>>40713110 #>>40713491 #>>40714220 #

TacticalCoder ◴[17 Jun 24 23:31 UTC] No.40712431[source]▶

>>40712174 #

> I don't find any of Chollet's critiques of LLMs to be convincing. It's almost as if he's being overly negative about them to make a point or something to push back against all the unbridled optimism.

Chollet published his paper On the measure of intelligence in 2019. In Internet time that is a lifetime before the LLM hype started.

replies(2): >>40712651 #>>40712888 #

1. gwern ◴[18 Jun 24 00:42 UTC] No.40712888[source]▶

>>40712431 #

From Chollet's perspective, the LLM hype started well before, with at least GPT-2 half a year before his paper, and he spent plenty of time mocking GPT-2 on Twitter before he came up with ARC as a rebuttal.

replies(1): >>40713697 #

2. modeless ◴[18 Jun 24 03:04 UTC] No.40713697[source]▶

>>40712888 (TP) #

It's a very convincing rebuttal considering that GPT-3 and GPT-4 came out after ARC but made no significant progress on it. He seemingly had the single most accurate and verifiable prediction of anyone in the world (in 2019) about exactly what type of tasks scaled LLMs would be bad at.

replies(1): >>40729440 #

3. gwern ◴[19 Jun 24 15:52 UTC] No.40729440[source]▶

>>40713697 #

Well, that's true inasmuch as every other prediction did far worse. Saying ARC did the best is passing a low bar when your competition is people like Gary Marcus or HNers saying 'yeah but no scaled-up GPT-3 could ever write a whole program'...

But since ARC was from the start clearly a vision task - most of these transforms or rules make no sense without a visual geometric prior - it wasn't that convincing, and we see plenty of progress with LLMs.

↑