Most active commenters

    ←back to thread

    GPT-5.2

    (openai.com)
    1019 points atgctg | 12 comments | | HN request time: 0.018s | source | bottom
    Show context
    zone411 ◴[] No.46236209[source]
    I've benchmarked it on the Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/):

    The high-reasoning version of GPT-5.2 improves on GPT-5.1: 69.9 → 77.9.

    The medium-reasoning version also improves: 62.7 → 72.1.

    The no-reasoning version also improves: 22.1 → 27.5.

    Gemini 3 Pro and Grok 4.1 Fast Reasoning still score higher.

    replies(4): >>46236325 #>>46236642 #>>46237650 #>>46241682 #
    1. scrollop ◴[] No.46237650[source]
    Why no grok 4.1 reasoning?
    replies(1): >>46239494 #
    2. sanex ◴[] No.46239494[source]
    Do people other than Elon fans use grok? Honest question. I've never tried it.
    replies(8): >>46240950 #>>46241184 #>>46241391 #>>46241742 #>>46241796 #>>46241902 #>>46242564 #>>46242875 #
    3. mac-attack ◴[] No.46240950[source]
    I can't understand why people would trust a CEO that regularly lies about product timelines, product features, his own personal life, etc. And that's before politicizing his entire kingdom by literally becoming a part of government and one of the larger donations of the current administration.
    replies(3): >>46241645 #>>46241905 #>>46242316 #
    4. bumling ◴[] No.46241184[source]
    I dislike Musk, and use Grok. I find it most useful for analyzing text to help check if there's anything I've missed in my own reading. Having it built in to Twitter is convenient and it has a generous free tier.
    5. buu700 ◴[] No.46241391[source]
    I use Grok pretty heavily, and Elon doesn't factor into it any more than Sam and Sundar do when I use GPT and Gemini. A few use cases where it really shines:

    * Research and planning

    * Writing complex isolated modules, particularly when the task depends on using a third-party API correctly (or even choosing an API/library at its own discretion)

    * Reasoning through complicated logic, particularly in cases that benefit from its eagerness to throw a ton of inference at problems where other LLMs might give a shallower or less accurate answer without more prodding

    I'll often fire off an off-the-cuff message from my phone to have Grok research some obscure topic that involves finding very specific data and crunching a bunch of numbers, or write a script for some random thing that I would previously never have bothered to spend time automating, and it'll churn for ~5 minutes on reasoning before giving me exactly what I wanted with few or no mistakes.

    As far as development, I personally get a lot of mileage out of collaborating with Grok and Gemini on planning/architecture/specs and coding with GPT. (I've stopped using Claude since GPT seems interchangeable at lower cost.)

    For reference, I'm only referring to the Grok chatbot right now. I've never actually tried Grok through agentic coding tooling.

    6. lkjdsklf ◴[] No.46241645{3}[source]
    If we stopped using products of every company that had a CEO that lied about their products, we’d all be sitting in caves staring at the dirt
    7. jbm ◴[] No.46241742[source]
    I use a few AIs together to examine the same code base. I find Grok better than some of the Chinese ones I've used, but it isn't in the same league as Claude or Codex.
    8. ralusek ◴[] No.46241796[source]
    Only thing I use grok for is if there is a current event/meme that I keep seeing referenced and I don't understand, it's good at pulling from tweets
    9. fatata123 ◴[] No.46241905{3}[source]
    Because not everyone makes their decisions through the prism of politics
    10. delaminator ◴[] No.46242316{3}[source]
    You’re not narrowing it down.
    11. wdroz ◴[] No.46242564[source]
    Unlike openai, you can use the latest grok models without verifying your organization and giving your ID.
    12. sz4kerto ◴[] No.46242875[source]
    I'm using Gemini in general, but Grok too. That's because sometimes Gemini Thinking is too slow, but Fast can get confused a lot. Grok strikes a nice balance between being quite smart (not Gemini 3 Pro level, but close) and very fast.