←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 1 comments | | HN request time: 0s | source
Show context
josalhor ◴[] No.46235005[source]
From GPT 5.1 Thinking:

ARC AGI v2: 17.6% -> 52.9%

SWE Verified: 76.3% -> 80%

That's pretty good!

replies(7): >>46235062 #>>46235070 #>>46235153 #>>46235160 #>>46235180 #>>46235421 #>>46236242 #
verdverm ◴[] No.46235062[source]
We're also in benchmark saturation territory. I heard it speculated that Anthropic emphasizes benchmarks less in their publications because internally they don't care about them nearly as much as making a model that works well on the day-to-day
replies(5): >>46235126 #>>46235266 #>>46235466 #>>46235492 #>>46235583 #
HDThoreaun ◴[] No.46235492[source]
Arc-AGI is just an iq test. I don’t see the problem with training it to be good at iq tests because that’s a skill that translates well.
replies(3): >>46236017 #>>46236535 #>>46236978 #
CamperBob2 ◴[] No.46236017{3}[source]
Exactly. In principle, at least, the only way to overfit to Arc-AGI is to actually be that smart.

Edit: if you disagree, try actually TAKING the Arc-AGI 2 test, then post.

replies(5): >>46236205 #>>46236247 #>>46236865 #>>46237072 #>>46237171 #
jimbokun ◴[] No.46236865{4}[source]
Is it different every time? Otherwise the training could just memorize the answers.
replies(1): >>46236877 #
1. CamperBob2 ◴[] No.46236877{5}[source]
The models never have access to the answers for the private set -- again, at least in principle. Whether that's actually true, I have no idea.

The idea behind Arc-AGI is that you can train all you want on the answers, because knowing the solution to one problem isn't helpful on the others.

In fact, the way the test works is that the model is given several examples of worked solutions for each problem class, and is then required to infer the underlying rule(s) needed to solve a different instance of the same type of problem.

That's why comparing Arc-AGI to chess or other benchmaxxing exercises is completely off base.

(IMO, an even better test for AGI would be "Make up some original Arc-AGI problems.")