Most active commenters

    ←back to thread

    GPT-5.2

    (openai.com)
    1019 points atgctg | 27 comments | | HN request time: 1.001s | source | bottom
    Show context
    tenpoundhammer ◴[] No.46236826[source]
    I have been using chatGPT a ton over the last months and paying the subscription. Used it for coding, news, stock analysis, daily problems, and a whatever I could think of. I decided to give Gemini a go when version three came out to great reviews. Gemini handles every single one of my uses cases much better and consistently gives better answers. This is especially true for situations were searching the web for current information is important, makes sense that google would be better. Also OCR is phenomenal chatgpt can't read my bad hand writing but Gemini can easily. Only downsides are in the polish department, there are more app bugs and I usually have to leave the happen or the session terminates. There are bugs with uploading photos. The biggest complaint is that all links get inserted into google search and then I have to manipulate them when they should go directly to the chosen website, this has to be some kind of internal org KPI nonsense. Overall, my conclusion is that ChatGPT has lost and won't catch up because of the search integration strength.
    replies(36): >>46236861 #>>46236896 #>>46236956 #>>46236971 #>>46236980 #>>46237123 #>>46237253 #>>46237258 #>>46237321 #>>46237407 #>>46237452 #>>46237531 #>>46237626 #>>46237654 #>>46237786 #>>46237888 #>>46237927 #>>46238237 #>>46238324 #>>46238527 #>>46238546 #>>46238828 #>>46239189 #>>46239400 #>>46239512 #>>46239719 #>>46239767 #>>46239999 #>>46240382 #>>46240656 #>>46240742 #>>46240760 #>>46240763 #>>46241303 #>>46241326 #>>46241523 #
    dmd ◴[] No.46237258[source]
    I consistently have exactly the opposite experience. ChatGPT seems extremely willing to do a huge number of searches, think about them, and then kick off more searches after that thinking, think about it, etc., etc. whereas it seems like Gemini is extremely reluctant to do more than a couple of searches. ChatGPT also is willing to open up PDFs, screenshot them, OCR them and use that as input, whereas Gemini just ignores them.
    replies(5): >>46237338 #>>46237556 #>>46237747 #>>46240627 #>>46241115 #
    1. nullbound ◴[] No.46237338[source]
    I will say that it is wild, if not somewhat problematic that two users have such disparate views of seemingly the same product. I say that, but then I remember my own experience just from few days ago. I don't pay for gemini, but I have paid chatgpt sub. I tested both for the same product with seemingly same prompt and subbed chatgpt subjectively beat gemini in terms of scope, options and links with current decent deals.

    It seems ( only seems, because I have not gotten around to test it in any systematic way ) that some variables like context and what the model knows about you may actually influence quality ( or lack thereof ) of the response.

    replies(10): >>46237530 #>>46237782 #>>46238005 #>>46238426 #>>46238540 #>>46238609 #>>46238817 #>>46238824 #>>46239808 #>>46240331 #
    2. dmd ◴[] No.46237530[source]
    And I’d really like for Gemini to be as good or better, since I get it for free with my Workspace account, whereas I pay for chatgpt. But every time I try both on a query I’m just blown away by how vastly better chatgpt is, at least for the heavy-on-searching-for-stuff kinds of queries I typically do.
    3. martinpw ◴[] No.46237782[source]
    > I will say that it is wild, if not somewhat problematic that two users have such disparate views of seemingly the same product.

    This happens all the time on HN. Before opening this thread, I was expecting that the top comment would be 100% positive about the product or its competitor, and one of the top replies would be exactly the opposite, and sure enough...

    I don't know why it is. It's honestly a bit disappointing that the most upvoted comments often have the least nuance.

    replies(2): >>46237859 #>>46238338 #
    4. block_dagger ◴[] No.46237859[source]
    Replace "on HN" with "in the course of human events" and we may have a generally true statement ;)
    5. Workaccount2 ◴[] No.46238005[source]
    Gemini has tons of people using it free via aistudio

    I can't help but feel that google gives free requests the absolute lowest priority, greatest quantization, cheapest thinking budget, etc.

    I pay for gemini and chatGPT and have been pretty hooked on Gemini 3 since launch.

    6. stevage ◴[] No.46238338[source]
    How much nuance can one person's experience have? If the top two most visible things are detailed, contrary experiences of the same product, that seems a pretty good outcome?
    7. jhancock ◴[] No.46238426[source]
    I can use GPT one day and the next get a different experience with the same problem space. Same with Gemini.
    replies(2): >>46238643 #>>46238793 #
    8. blks ◴[] No.46238540[source]
    Because neither product has any consistency in its results, no predictive behaviour. One day it performs well, another it hallucinates non existing facts and libraries. Those are stochastic machines
    replies(1): >>46238992 #
    9. crorella ◴[] No.46238609[source]
    It’s like having 3 coins and users preferring one or the other when tossing it because one coin gives consistently more heads (or tails) than the other coin.

    What is better is to build a good set of rules and stick to one and then refine those rules over time as you get more experience using the tool or if the tool evolves and digress from the results you expect.

    replies(1): >>46238668 #
    10. 4ndrewl ◴[] No.46238643[source]
    This is by design, given a non-determenitisic application?
    replies(1): >>46238690 #
    11. nullbound ◴[] No.46238668[source]
    << What is better is to build a good set of rules and

    But, unless you are on a local model you control, you literally can't. Otherwise, good rules will work only as long as the next update allows. I will admit that makes me consider some other options, but those probably shouldn't be 'set and iterate' each time something changes.

    replies(1): >>46240430 #
    12. jhancock ◴[] No.46238690{3}[source]
    sure. It may be more than that...possibly due to variable operating params on the servers and current load.

    On whole, if I compare my AI assistant to a human worker, I get more variance than I would from a human office worker.

    replies(2): >>46238899 #>>46239210 #
    13. sjaramillo ◴[] No.46238793[source]
    I guess LLMs have a mood too
    replies(1): >>46239460 #
    14. rabf ◴[] No.46238817[source]
    Chatgpt is not one model! Unless you manually specify to use a particular model your question can be routed to different models depending on what it guesses would be most appropriate for your question.
    replies(1): >>46240466 #
    15. nunez ◴[] No.46238824[source]
    Tesla FSD has been more or less the same experience. Some people drive 100s of miles without disengaging while others pull the plug within half a mile from their house. A lot of it depends on what the customer is willing to tolerate.
    16. pixl97 ◴[] No.46238899{4}[source]
    Thats because you don't 'own' the LLM compute. If you instead bought your office workers by the question I'm sure the variability would increase.
    17. sendes ◴[] No.46238992[source]
    I see the hyperbole is the point, but surely what these machines do is to literally predict? The entire prompt engineering endeavour is to get them to predict better and more precisely. Of course, these are not perfect solutions - they are stochastic after all, just not unpredictably.
    replies(1): >>46240345 #
    18. astrange ◴[] No.46239210{4}[source]
    They're not really capable of producing varying answers based on load.

    But they are capable of producing different answers because they feel like behaving differently if the current date is a holiday, and things like that. They're basically just little guys.

    19. dr_dshiv ◴[] No.46239460{3}[source]
    Vibes
    20. Bombthecat ◴[] No.46239808[source]
    Could also be a language thing ...
    21. austhrow743 ◴[] No.46240331[source]
    We've been having trouble telling if people are using the same product ever since Chat GPT first got popular. The had a free model and a paid model, that was it, no other competitors or naming schemes to worry about, and discussions were still full of people talking about current capabilities without saying what model they were using.

    For me, "gemini" currently means using this model in the llm.datasette.io cli tool.

    openrouter/google/gemini-3-pro-preview

    For what anyone else means? If they're equivalent? If Google does something different when you use "Gemini 3" in their browser app vs their cli app vs plans vs api users vs third party api users? No idea to any of the above.

    I hate naming in the llm space.

    22. coliveira ◴[] No.46240345{3}[source]
    Prompt engineering is voodoo. There's no sure way to determine how well these models will respond to a question. Of course, giving additional information may be helpful, but even that is not guaranteed.
    replies(2): >>46241557 #>>46241667 #
    23. crorella ◴[] No.46240430{3}[source]
    what I had in mind when I added that comment was for coding, with the use of .md files. For the web version of chats I agree there is little control on how to tailor the way you want the agent to behave, unless you give a initial "setup" prompt.
    24. stingraycharles ◴[] No.46240466[source]
    Isn’t that just standard MoE behavior? And isn’t the only choice you have from the UI between “Instant” and “Thinking”?
    replies(1): >>46241680 #
    25. lossyalgo ◴[] No.46241557{4}[source]
    Also every model update changes how you have to prompt them to get the answers you want. Setting up pre-prompts can help, but with each new version, you have to figure out through trial and error how to get it to respond to your type of queries.

    I can't wait to see how bad my finally sort-of-working ChatGPT 5.1 pre-prompts work with 5.2.

    Edit: How to talk to these models is actually documented, but you have to read through huge documents: https://cdn.openai.com/gpt-5-system-card.pdf

    26. baq ◴[] No.46241667{4}[source]
    It definitely isn’t voodoo, it’s more like forecasting weather. Some forecasts are easier to make, some are harder (it’ll be cold when it’s winter vs the exact location and wind speed of a tornado for an extreme example). The difference is you can try to mix things up in the prompt to maximize the likelihood of getting what you want out and there are feasibility thresholds for use cases, e.g. if you get a good answer 95% of the time it’s qualitatively different than 55%.
    27. baq ◴[] No.46241680{3}[source]
    MoE is a single model thing, model routing happens earlier.