←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 2 comments | | HN request time: 0.001s | source
Show context
Tiberium ◴[] No.46235157[source]
The only table where they showed comparisons against Opus 4.5 and Gemini 3:

https://x.com/OpenAI/status/1999182104362668275

https://i.imgur.com/e0iB8KC.png

replies(1): >>46235285 #
varenc ◴[] No.46235285[source]
100% on the AIME (assuming its not in the training data) is pretty impressive. I got like 4/15 when I was in HS...
replies(1): >>46236186 #
hellojimbo ◴[] No.46236186[source]
The no tools part is impressive, with tools every model gets 100%
replies(1): >>46238952 #
1. varenc ◴[] No.46238952[source]
If I recall, the AIME answers are always 4 digits numbers. And most of the problems are of the type where if you have a candidate number it's reasonable to validate its correctness. So easy to brute force all 4 digit ints with code.

tl;dr; humans would do much better too if they could use programming tools :)

replies(1): >>46240606 #
2. Davidzheng ◴[] No.46240606[source]
uh no it's not solved by looping over 4 digit numbers when it uses tools