(openai.com)

1019 points atgctg | 2 comments | 11 Dec 25 18:04 UTC | HN request time: 0.001s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

josalhor ◴[11 Dec 25 18:24 UTC] No.46235005[source]▶

>>46234788 (OP) #

From GPT 5.1 Thinking:

ARC AGI v2: 17.6% -> 52.9%

SWE Verified: 76.3% -> 80%

That's pretty good!

replies(7): >>46235062 #>>46235070 #>>46235153 #>>46235160 #>>46235180 #>>46235421 #>>46236242 #

1. poormathskills ◴[11 Dec 25 18:29 UTC] No.46235070[source]▶

>>46235005 #

For a minor version update (5.1 -> 5.2) that's a way bigger improvement than I would have guessed.

replies(1): >>46235317 #

2. beering ◴[11 Dec 25 18:46 UTC] No.46235317[source]▶

>>46235070 (TP) #

Model capability improvements are very uneven. Changes between one model and the next tend to benefit certain areas substantially without moving the needle on others. You see this across all frontier labs’ model releases. Also the version numbering is BS (remember GPT-4.5 followed by GPT-4.1?).

↑

GPT-5.2