←back to thread

GPT-5.2

(openai.com)
1094 points atgctg | 2 comments | | HN request time: 0.401s | source
Show context
goobatrooba ◴[] No.46237337[source]
I feel there is a point when all these benchmarks are meaningless. What I care about beyond decent performance is the user experience. There I have grudges with every single platform and the one thing keeping me as a paid ChatGPT subscriber is the ability to sort chats in "projects" with associated files (hello Google, please wake up to basic user-friendly organisation!)

But all of them * Lie far too often with confidence * Refuse to stick to prompts (e.g. ChatGPT to the request to number each reply for easy cross-referencing; Gemini to basic request to respond in a specific language) * Refuse to express uncertainty or nuance (i asked ChatGPT to give me certainty %s which it did for a while but then just forgot...?) * Refuse to give me short answers without fluff or follow up questions * Refuse to stop complimenting my questions or disagreements with wrong/incomplete answers * Don't quote sources consistently so I can check facts, even when I ask for it * Refuse to make clear whether they rely on original documents or an internal summary of the document, until I point out errors * ...

I also have substance gripes, but for me such basic usability points are really something all of the chatbots fail on abysmally. Stick to instructions! Stop creating walls of text for simple queries! Tell me when something is uncertain! Tell me if there's no data or info rather than making something up!

replies(7): >>46237455 #>>46239839 #>>46240780 #>>46241133 #>>46241957 #>>46242174 #>>46243336 #
1. fleischhauf ◴[] No.46243336[source]
I'm always impressed how fast people get used to new things. couple of years ago something like chatgpt was completely impossible, and now people complain it something's does mit do what you told it to and sometimes lies. (not saying your points are not valid or you should not raise them) Some of the points are just not fixable at this point due to tech limitations. A language model currently simply has no way to give an estimate of its confidence. Also there is no way to completely do away with hallucinations (lies). there need to be some more fundamental improvements for this to work reliably.
replies(1): >>46244493 #
2. davebren ◴[] No.46244493[source]
Your point would stand if the entire economy wasn't shifted around this product and employees weren't being told to use it or lose their jobs.