/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
GPT-5.2
(openai.com)
1053 points
atgctg
| 2 comments |
11 Dec 25 18:04 UTC
|
HN request time: 0.578s
|
source
https://platform.openai.com/docs/guides/latest-model
System card:
https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...
Show context
simonw
◴[
11 Dec 25 19:01 UTC
]
No.
46235580
[source]
▶
>>46234788 (OP)
#
Wow, there's a lot going on with this pelican riding a bicycle:
https://gist.github.com/simonw/c31d7afc95fe6b40506a9562b5e83...
replies(12):
>>46235608
#
>>46236119
#
>>46236455
#
>>46236615
#
>>46236751
#
>>46236849
#
>>46237862
#
>>46237969
#
>>46238631
#
>>46239729
#
>>46240577
#
>>46240638
#
Stevvo
◴[
11 Dec 25 20:40 UTC
]
No.
46236849
[source]
▶
>>46235580
#
The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect.
replies(3):
>>46237560
#
>>46240319
#
>>46241401
#
1.
golly_ned
◴[
11 Dec 25 21:40 UTC
]
No.
46237560
[source]
▶
>>46236849
#
Compared to the other benchmarks which are much more gameable, I trust PelicanBikeEval way more.
replies(2):
>>46239011
#
>>46239406
#
ID:
GO
2.
◴[
11 Dec 25 23:48 UTC
]
No.
46239011
[source]
▶
>>46237560 (TP)
#
↑