Most active commenters
  • simonw(3)

←back to thread

GPT-5.2

(openai.com)
1053 points atgctg | 26 comments | | HN request time: 0.614s | source | bottom
1. simonw ◴[] No.46235580[source]
Wow, there's a lot going on with this pelican riding a bicycle: https://gist.github.com/simonw/c31d7afc95fe6b40506a9562b5e83...
replies(12): >>46235608 #>>46236119 #>>46236455 #>>46236615 #>>46236751 #>>46236849 #>>46237862 #>>46237969 #>>46238631 #>>46239729 #>>46240577 #>>46240638 #
2. minimaxir ◴[] No.46235608[source]
Is that the first SVG pelican with drop shadows?
replies(1): >>46236157 #
3. tmaly ◴[] No.46236119[source]
seems to be eating something
replies(1): >>46236291 #
4. simonw ◴[] No.46236157[source]
No, I got drop shadows from DeepSeek 3.2 recently https://simonwillison.net/2025/Dec/1/deepseek-v32/ (probably others as well.)
5. danans ◴[] No.46236291[source]
Probably a jellyfish. You're seeing the tentacles
6. belter ◴[] No.46236455[source]
What happens if you ask for a pterodactyl on a motorbike?

Would like to know how much they are optimizing for your pelican....

replies(1): >>46236719 #
7. fxwin ◴[] No.46236615[source]
the only benchmark i trust
8. simonkagedal ◴[] No.46236719[source]
He commented on this here: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
replies(1): >>46237762 #
9. BeetleB ◴[] No.46236751[source]
They probably saw your complaint that 5.1 was too spartan and a regression (I had the same experience with 5.1 in the POV-Ray version - have yet to try 5.2 out...).
10. Stevvo ◴[] No.46236849[source]
The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect.
replies(3): >>46237560 #>>46240319 #>>46241401 #
11. golly_ned ◴[] No.46237560[source]
Compared to the other benchmarks which are much more gameable, I trust PelicanBikeEval way more.
replies(2): >>46239011 #>>46239406 #
12. irthomasthomas ◴[] No.46237762{3}[source]
I was expecting to see a pterodactyl :(
13. AstroBen ◴[] No.46237862[source]
Seems to be getting more aerodynamic. A clear sign of AI intelligence
14. sroussey ◴[] No.46237969[source]
What is good at SVG design?
replies(3): >>46239593 #>>46241396 #>>46242586 #
15. nightshift1 ◴[] No.46238631[source]
benchmarks probably should not be used for so long.
16. ◴[] No.46239011{3}[source]
17. azinman2 ◴[] No.46239593[source]
Graphic designers?
18. alechewitt ◴[] No.46239729[source]
Nice work on these benchmarks Simon. I’ve followed your blog closely since your great talk at the AI Engineers World Fair, and I want to say thank you for all the high quality content you share for free. It’s become my primary source for keeping up to date.

I’ve been working on a few benchmarks to test how well LLMs can recreate interfaces from screenshots. (https://github.com/alechewitt/llm-ui-challenge). From my basic tests, it seems GPT-5.2 is slightly better at these UI recreations. For example, in the MS Word replica, it implemented the undo/redo buttons as well as the bold/italic formatting that GPT-5.1 handled, and it generally seemed a bit closer to the original screenshot (https://alechewitt.github.io/llm-ui-challenge/outputs/micros...).

In the VS Code test, it also added the tabs that weren’t visible in the screenshot! (https://alechewitt.github.io/llm-ui-challenge/outputs/vs_cod...).

replies(1): >>46239894 #
19. simonw ◴[] No.46239894[source]
That is a very good benchmark. Interesting to see GPT-5.2 delivering on the promise of better vision support there.
20. tkgally ◴[] No.46240577[source]
I added GPT-5.2 Pro to my pelican-alternatives benchmark for the first three prompts:

Generate an SVG of an octopus operating a pipe organ

Generate an SVG of a giraffe assembling a grandfather clock

Generate an SVG of a starfish driving a bulldozer

https://gally.net/temp/20251107pelican-alternatives/index.ht...

GPT-5.2 Pro cost about 80 cents per prompt through OpenRouter, so I stopped there. I don’t feel like spending that much on all thirty prompts.

replies(1): >>46241918 #
21. tootie ◴[] No.46240638[source]
Do you think the big guys are on to your game and have been adding extra pelicans to the training data?
22. culi ◴[] No.46241396[source]
Not svg, but basically the same challenge:

https://clocks.brianmoore.com/

Probably Kimi or Deepseek are best

23. getnormality ◴[] No.46241401[source]
Well, the variance is itself interesting.
24. smusamashah ◴[] No.46241918[source]
Hi, it doesn't have Gemini 3.5 Pro which seems to be the best at this
replies(1): >>46243080 #
25. KellyCriterion ◴[] No.46242586[source]
Ive not seen any model being good in graphic/svg creation so far - all of the stuff mostly looks ugly and somewhat "synthetic-disorted".

And lately, Claude (web) started to draw ascii charts from one day to another indstead of colorful infographicstyled-images as it did before (they were only slightly better than the ascii charts)

26. svantana ◴[] No.46243080{3}[source]
That's probably because "Gemini 3.5 Pro" doesn't exist