Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)

394 points tomduncalf | 2 comments | 17 Jun 24 21:51 UTC | HN request time: 1.74s | source

Show context

atleastoptimal ◴[18 Jun 24 04:44 UTC] No.40714152[source]▶

I'll say what a lot of people seem to be denying. GPT-4 is an AGI, just a very bad one. Even GPT-1 was an AGI. There isn't a hard boundary between non AGI and AGI. A lot of people wish there was so they imagine absolutes regarding LLM's like "they cannot create anything new" or something like that. Just think: we consider humans a general intelligence, but obviously wouldn't consider an embryo or infant a general intelligence. So at what point does a human go from not generally intelligent to generally intelligent? And I don't mean an age or brain size, I mean suite of testable abilities.

Intelligence is an ability that is naturally gradual and emerges over many domains. It is a collection of tools via which general abstractive principles can be applied, not a singular universally applicable ability to think in abstractions. GPT-4, compared to a human, is a very very small brain trained for the single purpose of textual thinking with some image capabilities. Claiming that ARC is the absolute market of general intelligence fails to account for the big picture of what intelligence is.

replies(7): >>40714189 #>>40714191 #>>40714565 #>>40715248 #>>40715346 #>>40715384 #>>40716518 #

blharr ◴[18 Jun 24 04:56 UTC] No.40714189[source]▶

>>40714152 #

The "general" part of AGI implies it should be capable across all types of different tasks. I would definitely call it real Artificial Intelligence, but it's not general by any means.

replies(1): >>40714598 #

FeepingCreature ◴[18 Jun 24 06:20 UTC] No.40714598[source]▶

>>40714189 #

It's capable of attempting all types of different tasks. That is a novel capability on its own. We're used to GPT's amusing failures at this point, so we forget that there is absolutely no input you could hand to a chess program that would get it to try and play checkers.

Not so with GPT. It will try, and fail, but that it tries at all was unimaginable five years ago.

replies(1): >>40715160 #

dahart ◴[18 Jun 24 07:50 UTC] No.40715160[source]▶

>>40714598 #

Its amusing to me how the very language used to describe GPT anthropomorphizes it. GPT wont “attempt” or “try” anything on its own without a human telling it what to try, it has no agenda, no will, no agency, no self-reflection, no initiative, no fear, and no desire. It’s all A and no I.

replies(2): >>40715775 #>>40719521 #

FeepingCreature ◴[18 Jun 24 09:49 UTC] No.40715775[source]▶

>>40715160 #

Do you agree that "there is absolutely no input you could hand to a chess program that would get it to try and play checkers", but there is an input you can hand to GPT-3+ that will get it to try and play pretty much any game imaginable, so long as you agree that its attempt will be very poor?

I don't want to get into the weeds on what intelligence is or what "attempt" means or "try" means (you can probably guess I disagree with your position), but do you have a disagreement on pure input/output behavior? Do you disagree that if I put adequate words in, words will come out that will resemble an attempt to do the task, for nearly any task that exists?

replies(1): >>40720764 #

dahart ◴[18 Jun 24 18:39 UTC] No.40720764[source]▶

>>40715775 #

You’re trying to avoid addressing my point. What can GPT do that’s interesting without a human in the loop doing the prompting?

Lol “very poor”. You’re attempting to argue that if there’s any output at all in response to an input prompt, then GPT is “trying” and showing signs of intelligence, no matter what the output is. By this logic, you contradicted yourself: the chess engine can play checkers, poorly. By this logic, asking the sky to play a game means the sky is trying because it changes, or asking a random number generator to play a game means it resembles an attempt to play because there is “very poor” output.

There are lots of games GPT can’t play, like hide-and-seek, tag, and tennis. Playing a game means playing by the rules of the game, giving coherent output, and trying to win. GPT can’t play games it hasn’t seen before, and no I don’t agree that “very poor” output counts. It doesn’t (currently) learn the rules from your prompts; you can’t teach it to play a new game by talking to it, and the “very poor” output from a game it wasn’t trained on will never improve. And, to my actual point, GPT will not play any games at all unless you ask it to.

replies(2): >>40721802 #>>40725464 #

CamperBob2 ◴[18 Jun 24 20:18 UTC] No.40721802[source]▶

>>40720764 #

What can GPT do that’s interesting without a human in the loop doing the prompting?

Understand what the human in the loop doing the prompting is asking for, for one thing.

The magical aspects of LLMs are on the input side, not the output.

replies(1): >>40722148 #

dahart ◴[18 Jun 24 20:58 UTC] No.40722148[source]▶

>>40721802 #

This probably isn’t what you meant, but if the magic is on the input side, then everything interesting about interacting with GPT is being provided by the human and not GPT.

We don’t have any strong evidence that GPT “understands” its input in general. We absolutely have examples of GPT failing to understand some inputs (and not knowing it, and insisting on bogus output). And we know for a fact that it was designed and built to produce statistically plausible output. GPT is a mechanical device designed by humans to pass the Turing test. We’ve designed and built something that is exceptionally good at making humans believe it is smarter than it is.

replies(1): >>40722551 #

1. CamperBob2 ◴[18 Jun 24 21:59 UTC] No.40722551[source]▶

>>40722148 #

We’ve designed and built something that is exceptionally good at making humans believe it is smarter than it is.

Yep, and even ELIZA could do that, to some extent. But at some point you'll need to define what "understanding" means, and explain why an LLM isn't doing it.

replies(1): >>40722950 #

2. dahart ◴[18 Jun 24 22:55 UTC] No.40722950[source]▶

>>40722551 (TP) #

Totally, and that’s a fair point. I don’t know what understanding means, not enough to prove an LLM can’t, anyway, and I think nobody has a good enough definition yet to satisfy this crowd. But I think we can make progress with nothing more than the dictionary definition of “understand”, which is the ability to perceive and interpret. I think we can probably agree that a rock doesn’t understand. And we can probably also agree that a random number generator doesn’t understand. The problem with @FeepingCreature’s argument is that the quality of the response does matter. The ability for a machine that’s specifically designed to wait for input and then provide an output, to then provide a low quality response, doesn’t demonstrate any more intelligence than a bicycle… right? I don’t know where the line is between my random writer Markov chain text generator from college and today’s LLMs. I’m told transformers are fundamentally the same and just have an adaptive window size. More training data then is the primary difference. So then we are saying Excel’s least-squares function fitter does not understand, unless the function has a billion data points? Or, if there’s a line, what does it look like and where is it?

↑