Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)

394 points tomduncalf | 2 comments | 17 Jun 24 21:51 UTC | HN request time: 0.451s | source

Show context

eigenvalue ◴[17 Jun 24 23:00 UTC] No.40712174[source]▶

The Arc stuff just felt intuitively wrong as soon as I heard it. I don't find any of Chollet's critiques of LLMs to be convincing. It's almost as if he's being overly negative about them to make a point or something to push back against all the unbridled optimism. The problem is, the optimism really seems to be justified, and the rate of improvement of LLMs in the past 12 months has been nothing short of astonishing.

So it's not at all surprising to me to see Arc already being mostly solved using existing models, just with different prompting techniques and some tool usage. At some point, the naysayers about LLMs are going to have to confront the problem that, if they are right about LLMs not really thinking/understanding/being sentient, then a very large percentage of people living today are also not thinking/understanding/sentient!

replies(11): >>40712233 #>>40712290 #>>40712304 #>>40712352 #>>40712385 #>>40712431 #>>40712465 #>>40712713 #>>40713110 #>>40713491 #>>40714220 #

TacticalCoder ◴[17 Jun 24 23:31 UTC] No.40712431[source]▶

>>40712174 #

> I don't find any of Chollet's critiques of LLMs to be convincing. It's almost as if he's being overly negative about them to make a point or something to push back against all the unbridled optimism.

Chollet published his paper On the measure of intelligence in 2019. In Internet time that is a lifetime before the LLM hype started.

replies(2): >>40712651 #>>40712888 #

refulgentis ◴[18 Jun 24 00:02 UTC] No.40712651[source]▶

>>40712431 #

Einstein, infamously, couldn't really make much progress with quantum physics, even though he invented the precursors (ex. Brownian motion). Your world model is hard to update.

replies(2): >>40713512 #>>40713624 #

imperfect_light ◴[18 Jun 24 02:28 UTC] No.40713512[source]▶

>>40712651 #

A bit of a stretch given that Chollet is a researcher in deep learning and transformers and his criticism is that memorization (training LLMs on lots and lots of problems) doesn't equate to AGI.

replies(1): >>40713693 #

refulgentis ◴[18 Jun 24 03:04 UTC] No.40713693[source]▶

>>40713512 #

> A bit of a stretch

Is that true?

C.f. what we're discussing

He's actively encouraging using LLMs to solve his benchmark, called ARC AGI.

8 hours ago, from Chollet, re: TFA

"The best solution to fight combinatorial explosion is to leverage intuition over the structure of program space, provided by a deep learning model. For instance, you can use a LLM to sample a program..."

Source: https://x.com/fchollet/status/1802801425514410275

replies(1): >>40713793 #

imperfect_light ◴[18 Jun 24 03:25 UTC] No.40713793[source]▶

>>40713693 #

The stretch was in reference to comparing Chollet to Einstein. Chollet clearly understands LLMs (and transformers and deep learning), he simply doesn't believe they are sufficient for AGI.

replies(1): >>40714132 #

refulgentis ◴[18 Jun 24 04:39 UTC] No.40714132[source]▶

>>40713793 #

I don't know what you mean, it's a straightforward analogy, but yes, that's right, except for the part where he's heralding this news by telling people the LLM is an underexplored solution space for a possible solution to his AGI benchmark he made to disprove LLMs are AGI.

I don't mean to offend, but to be really straightforward: he's the one saying it's possible they might be AGI now. I'm as flummoxed as you, but I think its hiding the ball to file it under "he doesn't mean what he's saying, because he doesn't believe LLMs can ever be AGI." The only steelman for that is playing at: AGI-my-benchmark, which I say is for AGI, is not the AGI I mean

replies(1): >>40714424 #

imperfect_light ◴[18 Jun 24 05:51 UTC] No.40714424[source]▶

>>40714132 #

You're reading a whole lot into a tweet, in his interview with Dwarkesh Patel he says, about 20 different times, that scaling LLMs (as they are currently conceived) won't lead to AGI.

replies(1): >>40714964 #

1. anoncareer0212 ◴[18 Jun 24 07:17 UTC] No.40714964[source]▶

>>40714424 #

You keep changing topics so I don't get it either, I can attest it's not a fringe view that the situation is interesting, seen it discussed several times today by unrelated people.

replies(1): >>40720617 #

2. imperfect_light ◴[18 Jun 24 18:20 UTC] No.40720617[source]▶

>>40714964 (TP) #

He's said it pretty clearly, an LLM could be part of the solution in combination with program synthesis, but an LLM alone won't achieve AGI.

↑