(openai.com)

1053 points atgctg | 1 comments | 11 Dec 25 18:04 UTC | HN request time: 0.239s | source

Show context

josalhor ◴[11 Dec 25 18:24 UTC] No.46235005[source]▶

From GPT 5.1 Thinking:

ARC AGI v2: 17.6% -> 52.9%

SWE Verified: 76.3% -> 80%

That's pretty good!

minimaxir ◴[11 Dec 25 18:35 UTC] No.46235160[source]▶

Note that GPT 5.2 newly supports a "xhigh" reasoning level, which could explain the better benchmarks.

It'll be noteworthy to see the cost-per-task on ARC AGI v2.

1. walletdrainer ◴[11 Dec 25 19:44 UTC] No.46236189[source]▶

5.1-codex supports that too, no? Pretty sure I’ve been using xhigh for at least a week now

GPT-5.2