From GPT 5.1 Thinking:
ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
replies(7):
ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
Already live. gpt-5.2-pro scores a new high of 54.2% with a cost/task of $15.72. The previous best was Gemini 3 Pro (54% with a cost/task of $30.57).
The best bang-for-your-buck is the new xhigh on gpt-5.2, which is 52.9% for $1.90, a big improvement on the previous best in this category which was Opus 4.5 (37.6% for $2.40).