GPT-5.2 | slacker news

They probably saw your complaint that 5.1 was too spartan and a regression (I had the same experience with 5.1 in the POV-Ray version - have yet to try 5.2 out...).

10. Stevvo ◴[11 Dec 25 20:40 UTC] No.46236849[source]▶

>>46235580 (TP) #

The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect.

replies(3): >>46237560 #>>46240319 #>>46241401 #

11. golly_ned ◴[11 Dec 25 21:40 UTC] No.46237560[source]▶

>>46236849 #

Compared to the other benchmarks which are much more gameable, I trust PelicanBikeEval way more.

replies(2): >>46239011 #>>46239406 #

12. irthomasthomas ◴[11 Dec 25 21:56 UTC] No.46237762{3}[source]▶

>>46236719 #

I was expecting to see a pterodactyl :(

13. AstroBen ◴[11 Dec 25 22:03 UTC] No.46237862[source]▶

>>46235580 (TP) #

Seems to be getting more aerodynamic. A clear sign of AI intelligence

14. sroussey ◴[11 Dec 25 22:12 UTC] No.46237969[source]▶

>>46235580 (TP) #

What is good at SVG design?

replies(3): >>46239593 #>>46241396 #>>46242586 #

15. nightshift1 ◴[11 Dec 25 23:09 UTC] No.46238631[source]▶

>>46235580 (TP) #

benchmarks probably should not be used for so long.

16. ◴[11 Dec 25 23:48 UTC] No.46239011{3}[source]▶

>>46237560 #

17. azinman2 ◴[12 Dec 25 00:53 UTC] No.46239593[source]▶

>>46237969 #

Graphic designers?

18. alechewitt ◴[12 Dec 25 01:17 UTC] No.46239729[source]▶

>>46235580 (TP) #

Nice work on these benchmarks Simon. I’ve followed your blog closely since your great talk at the AI Engineers World Fair, and I want to say thank you for all the high quality content you share for free. It’s become my primary source for keeping up to date.

I’ve been working on a few benchmarks to test how well LLMs can recreate interfaces from screenshots. (https://github.com/alechewitt/llm-ui-challenge). From my basic tests, it seems GPT-5.2 is slightly better at these UI recreations. For example, in the MS Word replica, it implemented the undo/redo buttons as well as the bold/italic formatting that GPT-5.1 handled, and it generally seemed a bit closer to the original screenshot (https://alechewitt.github.io/llm-ui-challenge/outputs/micros...).

In the VS Code test, it also added the tabs that weren’t visible in the screenshot! (https://alechewitt.github.io/llm-ui-challenge/outputs/vs_cod...).

replies(1): >>46239894 #

19. simonw ◴[12 Dec 25 01:41 UTC] No.46239894[source]▶

>>46239729 #

That is a very good benchmark. Interesting to see GPT-5.2 delivering on the promise of better vision support there.

20. tkgally ◴[12 Dec 25 03:30 UTC] No.46240577[source]▶

>>46235580 (TP) #

I added GPT-5.2 Pro to my pelican-alternatives benchmark for the first three prompts:

Generate an SVG of an octopus operating a pipe organ

Generate an SVG of a giraffe assembling a grandfather clock

Generate an SVG of a starfish driving a bulldozer

https://gally.net/temp/20251107pelican-alternatives/index.ht...

GPT-5.2 Pro cost about 80 cents per prompt through OpenRouter, so I stopped there. I don’t feel like spending that much on all thirty prompts.

replies(1): >>46241918 #

21. tootie ◴[12 Dec 25 03:40 UTC] No.46240638[source]▶

>>46235580 (TP) #

Do you think the big guys are on to your game and have been adding extra pelicans to the training data?

22. culi ◴[12 Dec 25 06:24 UTC] No.46241396[source]▶

>>46237969 #

Not svg, but basically the same challenge:

https://clocks.brianmoore.com/

Probably Kimi or Deepseek are best

23. getnormality ◴[12 Dec 25 06:25 UTC] No.46241401[source]▶

>>46236849 #

Well, the variance is itself interesting.

24. smusamashah ◴[12 Dec 25 08:04 UTC] No.46241918[source]▶

>>46240577 #

Hi, it doesn't have Gemini 3.5 Pro which seems to be the best at this

replies(1): >>46243080 #

25. KellyCriterion ◴[12 Dec 25 10:02 UTC] No.46242586[source]▶

>>46237969 #

Ive not seen any model being good in graphic/svg creation so far - all of the stuff mostly looks ugly and somewhat "synthetic-disorted".

And lately, Claude (web) started to draw ascii charts from one day to another indstead of colorful infographicstyled-images as it did before (they were only slightly better than the ascii charts)

26. svantana ◴[12 Dec 25 11:24 UTC] No.46243080{3}[source]▶

>>46241918 #

That's probably because "Gemini 3.5 Pro" doesn't exist