←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 4 comments | | HN request time: 0s | source
Show context
zone411 ◴[] No.46236209[source]
I've benchmarked it on the Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/):

The high-reasoning version of GPT-5.2 improves on GPT-5.1: 69.9 → 77.9.

The medium-reasoning version also improves: 62.7 → 72.1.

The no-reasoning version also improves: 22.1 → 27.5.

Gemini 3 Pro and Grok 4.1 Fast Reasoning still score higher.

replies(4): >>46236325 #>>46236642 #>>46237650 #>>46241682 #
1. tikotus ◴[] No.46236642[source]
Here's someone else testing models on a daily logic puzzle (Clues by Sam): https://www.nicksypteras.com/blog/cbs-benchmark.html GPT 5 Pro was the winner already before in that test.
replies(2): >>46236715 #>>46236827 #
2. thanhhaimai ◴[] No.46236715[source]
This link doesn't have Gemini 3 performance on it. Do you have an updated link with the new models?
replies(1): >>46241420 #
3. crapple8430 ◴[] No.46236827[source]
GPT 5 Pro is a good 10x more expensive so it's an apples to oranges comparison.
4. dezgeg ◴[] No.46241420[source]
I've also tried Gemini 3 for Clues by Sam and it can do really well, have not seen it make a single mistake even for Hard and Tricky ones. Haven't run it on too many puzzles though.