←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 1 comments | | HN request time: 0.001s | source
Show context
simonw ◴[] No.46235580[source]
Wow, there's a lot going on with this pelican riding a bicycle: https://gist.github.com/simonw/c31d7afc95fe6b40506a9562b5e83...
replies(12): >>46235608 #>>46236119 #>>46236455 #>>46236615 #>>46236751 #>>46236849 #>>46237862 #>>46237969 #>>46238631 #>>46239729 #>>46240577 #>>46240638 #
alechewitt ◴[] No.46239729[source]
Nice work on these benchmarks Simon. I’ve followed your blog closely since your great talk at the AI Engineers World Fair, and I want to say thank you for all the high quality content you share for free. It’s become my primary source for keeping up to date.

I’ve been working on a few benchmarks to test how well LLMs can recreate interfaces from screenshots. (https://github.com/alechewitt/llm-ui-challenge). From my basic tests, it seems GPT-5.2 is slightly better at these UI recreations. For example, in the MS Word replica, it implemented the undo/redo buttons as well as the bold/italic formatting that GPT-5.1 handled, and it generally seemed a bit closer to the original screenshot (https://alechewitt.github.io/llm-ui-challenge/outputs/micros...).

In the VS Code test, it also added the tabs that weren’t visible in the screenshot! (https://alechewitt.github.io/llm-ui-challenge/outputs/vs_cod...).

replies(1): >>46239894 #
1. simonw ◴[] No.46239894[source]
That is a very good benchmark. Interesting to see GPT-5.2 delivering on the promise of better vision support there.