(openai.com)

1019 points atgctg | 4 comments | 11 Dec 25 18:04 UTC | HN request time: 0s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

zone411 ◴[11 Dec 25 19:46 UTC] No.46236209[source]▶

>>46234788 (OP) #

I've benchmarked it on the Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/):

The high-reasoning version of GPT-5.2 improves on GPT-5.1: 69.9 → 77.9.

The medium-reasoning version also improves: 62.7 → 72.1.

The no-reasoning version also improves: 22.1 → 27.5.

Gemini 3 Pro and Grok 4.1 Fast Reasoning still score higher.

replies(4): >>46236325 #>>46236642 #>>46237650 #>>46241682 #

1. tikotus ◴[11 Dec 25 20:24 UTC] No.46236642[source]▶

>>46236209 #

Here's someone else testing models on a daily logic puzzle (Clues by Sam): https://www.nicksypteras.com/blog/cbs-benchmark.html GPT 5 Pro was the winner already before in that test.

replies(2): >>46236715 #>>46236827 #

2. thanhhaimai ◴[11 Dec 25 20:30 UTC] No.46236715[source]▶

>>46236642 (TP) #

This link doesn't have Gemini 3 performance on it. Do you have an updated link with the new models?

replies(1): >>46241420 #

3. crapple8430 ◴[11 Dec 25 20:39 UTC] No.46236827[source]▶

>>46236642 (TP) #

GPT 5 Pro is a good 10x more expensive so it's an apples to oranges comparison.

4. dezgeg ◴[12 Dec 25 06:30 UTC] No.46241420[source]▶

>>46236715 #

I've also tried Gemini 3 for Clues by Sam and it can do really well, have not seen it make a single mistake even for Hard and Tricky ones. Haven't run it on too many puzzles though.

↑