GPT-5.2

(openai.com)

1019 points atgctg | 3 comments | 11 Dec 25 18:04 UTC | HN request time: 0.001s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

goobatrooba ◴[11 Dec 25 21:22 UTC] No.46237337[source]▶

I feel there is a point when all these benchmarks are meaningless. What I care about beyond decent performance is the user experience. There I have grudges with every single platform and the one thing keeping me as a paid ChatGPT subscriber is the ability to sort chats in "projects" with associated files (hello Google, please wake up to basic user-friendly organisation!)

But all of them * Lie far too often with confidence * Refuse to stick to prompts (e.g. ChatGPT to the request to number each reply for easy cross-referencing; Gemini to basic request to respond in a specific language) * Refuse to express uncertainty or nuance (i asked ChatGPT to give me certainty %s which it did for a while but then just forgot...?) * Refuse to give me short answers without fluff or follow up questions * Refuse to stop complimenting my questions or disagreements with wrong/incomplete answers * Don't quote sources consistently so I can check facts, even when I ask for it * Refuse to make clear whether they rely on original documents or an internal summary of the document, until I point out errors * ...

I also have substance gripes, but for me such basic usability points are really something all of the chatbots fail on abysmally. Stick to instructions! Stop creating walls of text for simple queries! Tell me when something is uncertain! Tell me if there's no data or info rather than making something up!

replies(6): >>46237455 #>>46239839 #>>46240780 #>>46241133 #>>46241957 #>>46242174 #

1. razster ◴[12 Dec 25 05:25 UTC] No.46241133[source]▶

>>46237337 #

The latest of the big three... OpenAI, Claude, and Google, none of their models are good. I've spent too much time monitoring them than just enjoying them. I've found it easier to run my own local LLM. The latest Gemini release, I gave it another go but only for it to misspell words and drift off into a fantasy world after a few chats with help restructuring guides. ChatGPT has become lazy for some reason and changes things I told it to ignore, randomly too. Claude was doing great until the latest release, then it started getting lazy after 20+k tokens. I tried making sure to keep a guide to refresh it if it started forgetting, but that didn't help.

Locals are better; I can script and have them script for me to build a guide creation process. They don't forget because that is all they're trained on. I'm done paying for 'AI'.

replies(2): >>46241439 #>>46242213 #

2. striking ◴[12 Dec 25 06:35 UTC] No.46241439[source]▶

>>46241133 (TP) #

What's to stop you from using the APIs the way you'd like?

3. marcosscriven ◴[12 Dec 25 08:59 UTC] No.46242213[source]▶

>>46241133 (TP) #

What are your best local models, and what hardware do you run them on?

↑