←back to thread

GPT-5.2

(openai.com)

1019 points atgctg | 2 comments | 11 Dec 25 18:04 UTC | HN request time: 0.001s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

Tiberium ◴[11 Dec 25 18:35 UTC] No.46235157[source]▶

>>46234788 (OP) #

The only table where they showed comparisons against Opus 4.5 and Gemini 3:

https://x.com/OpenAI/status/1999182104362668275

https://i.imgur.com/e0iB8KC.png

replies(1): >>46235285 #

varenc ◴[11 Dec 25 18:45 UTC] No.46235285[source]▶

100% on the AIME (assuming its not in the training data) is pretty impressive. I got like 4/15 when I was in HS...

replies(1): >>46236186 #

hellojimbo ◴[11 Dec 25 19:44 UTC] No.46236186[source]▶

The no tools part is impressive, with tools every model gets 100%

replies(1): >>46238952 #

1. varenc ◴[11 Dec 25 23:41 UTC] No.46238952[source]▶

If I recall, the AIME answers are always 4 digits numbers. And most of the problems are of the type where if you have a candidate number it's reasonable to validate its correctness. So easy to brute force all 4 digit ints with code.

tl;dr; humans would do much better too if they could use programming tools :)

replies(1): >>46240606 #

2. Davidzheng ◴[12 Dec 25 03:36 UTC] No.46240606[source]▶

>>46238952 (TP) #

uh no it's not solved by looping over 4 digit numbers when it uses tools