←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 2 comments | | HN request time: 0s | source
Show context
josalhor ◴[] No.46235005[source]
From GPT 5.1 Thinking:

ARC AGI v2: 17.6% -> 52.9%

SWE Verified: 76.3% -> 80%

That's pretty good!

replies(7): >>46235062 #>>46235070 #>>46235153 #>>46235160 #>>46235180 #>>46235421 #>>46236242 #
verdverm ◴[] No.46235062[source]
We're also in benchmark saturation territory. I heard it speculated that Anthropic emphasizes benchmarks less in their publications because internally they don't care about them nearly as much as making a model that works well on the day-to-day
replies(5): >>46235126 #>>46235266 #>>46235466 #>>46235492 #>>46235583 #
HDThoreaun ◴[] No.46235492[source]
Arc-AGI is just an iq test. I don’t see the problem with training it to be good at iq tests because that’s a skill that translates well.
replies(3): >>46236017 #>>46236535 #>>46236978 #
CamperBob2 ◴[] No.46236017{3}[source]
Exactly. In principle, at least, the only way to overfit to Arc-AGI is to actually be that smart.

Edit: if you disagree, try actually TAKING the Arc-AGI 2 test, then post.

replies(5): >>46236205 #>>46236247 #>>46236865 #>>46237072 #>>46237171 #
esafak ◴[] No.46236247{4}[source]
I would not be so sure. You can always prep to the test.
replies(1): >>46236590 #
HDThoreaun ◴[] No.46236590{5}[source]
How do you prep for arc agi? If the answer is just "get really good at pattern recognition" I do not see that as a negative at all.
replies(1): >>46238125 #
1. ben_w ◴[] No.46238125{6}[source]
It can be not-negative without being sufficient.

Imagine that pattern recognition is 10% of the problem, and we just don't know what the other 90% is yet.

Streetlight effect for "what is intelligence" leads to all the things that LLMs are now demonstrably good at… and yet, the LLMs are somehow missing a lot of stuff and we have to keep inventing new street lights to search underneath: https://en.wikipedia.org/wiki/Streetlight_effect

replies(1): >>46238638 #
2. HDThoreaun ◴[] No.46238638[source]
I dont think many people are saying 100% arc-agi 2 is equivalent to AGI(names are dumb as usual). Its just the best metric I have found, not the final answer. Spatial reasoning is an important part of intelligence even if it doesnt encompass all of it.