GPT-5.2

(openai.com)

1019 points atgctg | 1 comments | 11 Dec 25 18:04 UTC | HN request time: 0s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

josalhor ◴[11 Dec 25 18:24 UTC] No.46235005[source]▶

>>46234788 (OP) #

From GPT 5.1 Thinking:

ARC AGI v2: 17.6% -> 52.9%

SWE Verified: 76.3% -> 80%

That's pretty good!

replies(7): >>46235062 #>>46235070 #>>46235153 #>>46235160 #>>46235180 #>>46235421 #>>46236242 #

verdverm ◴[11 Dec 25 18:28 UTC] No.46235062[source]▶

>>46235005 #

We're also in benchmark saturation territory. I heard it speculated that Anthropic emphasizes benchmarks less in their publications because internally they don't care about them nearly as much as making a model that works well on the day-to-day

replies(5): >>46235126 #>>46235266 #>>46235466 #>>46235492 #>>46235583 #

HDThoreaun ◴[11 Dec 25 18:56 UTC] No.46235492[source]▶

>>46235062 #

Arc-AGI is just an iq test. I don’t see the problem with training it to be good at iq tests because that’s a skill that translates well.

replies(3): >>46236017 #>>46236535 #>>46236978 #

CamperBob2 ◴[11 Dec 25 19:29 UTC] No.46236017{3}[source]▶

>>46235492 #

Exactly. In principle, at least, the only way to overfit to Arc-AGI is to actually be that smart.

Edit: if you disagree, try actually TAKING the Arc-AGI 2 test, then post.

replies(5): >>46236205 #>>46236247 #>>46236865 #>>46237072 #>>46237171 #

esafak ◴[11 Dec 25 19:50 UTC] No.46236247{4}[source]▶

>>46236017 #

I would not be so sure. You can always prep to the test.

replies(1): >>46236590 #

HDThoreaun ◴[11 Dec 25 20:18 UTC] No.46236590{5}[source]▶

>>46236247 #

How do you prep for arc agi? If the answer is just "get really good at pattern recognition" I do not see that as a negative at all.

replies(1): >>46238125 #

ben_w ◴[11 Dec 25 22:25 UTC] No.46238125{6}[source]▶

>>46236590 #

It can be not-negative without being sufficient.

Imagine that pattern recognition is 10% of the problem, and we just don't know what the other 90% is yet.

Streetlight effect for "what is intelligence" leads to all the things that LLMs are now demonstrably good at… and yet, the LLMs are somehow missing a lot of stuff and we have to keep inventing new street lights to search underneath: https://en.wikipedia.org/wiki/Streetlight_effect

replies(1): >>46238638 #

1. HDThoreaun ◴[11 Dec 25 23:10 UTC] No.46238638{7}[source]▶

>>46238125 #

I dont think many people are saying 100% arc-agi 2 is equivalent to AGI(names are dumb as usual). Its just the best metric I have found, not the final answer. Spatial reasoning is an important part of intelligence even if it doesnt encompass all of it.

↑