Most active commenters
  • eru(4)
  • mmaunder(3)

←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 37 comments | | HN request time: 0.577s | source | bottom
1. mmaunder ◴[] No.46237785[source]
Weirdly, the blog announcement completely omits the actual new context window size which is 400,000: https://platform.openai.com/docs/models/gpt-5.2

Can I just say !!!!!!!! Hell yeah! Blog post indicates it's also much better at using the full context.

Congrats OpenAI team. Huge day for you folks!!

Started on Claude Code and like many of you, had that omg CC moment we all had. Then got greedy.

Switched over to Codex when 5.1 came out. WOW. Really nice acceleration in my Rust/CUDA project which is a gnarly one.

Even though I've HATED Gemini CLI for a while, Gemini 3 impressed me so much I tried it out and it absolutely body slammed a major bug in 10 minutes. Started using it to consult on commits. Was so impressed it became my daily driver. Huge mistake. I almost lost my mind after a week of this fighting it. Isane bias towards action. Ignoring user instructions. Garbage characters in output. Absolutely no observability in its thought process. And on and on.

Switched back to Codex just in time for 5.1 codex max xhigh which I've been using for a week, and it was like a breath of fresh air. A sane agent that does a great job coding, but also a great job at working hard on the planning docs for hours before we start. Listens to user feedback. Observability on chain of thought. Moves reasonably quickly. And also makes it easy to pay them more when I need more capacity.

And then today GPT-5.2 with an xhigh mode. I feel like xmass has come early. Right as I'm doing a huge Rust/CUDA/Math-heavy refactor. THANK YOU!!

replies(8): >>46237912 #>>46238166 #>>46238297 #>>46240408 #>>46240891 #>>46241079 #>>46241471 #>>46241483 #
2. twisterius ◴[] No.46237912[source]
[flagged]
replies(1): >>46238825 #
3. freedomben ◴[] No.46238166[source]
I haven't done a ton of testing due to cost, but so far I've actually gotten worse results with xhigh than high with gpt-5.1-codex-max. Made me wonder if it was somehow a PEBKAC error. Have you done much comparison between high and xhigh?
replies(3): >>46238482 #>>46238491 #>>46238659 #
4. lopuhin ◴[] No.46238297[source]
Context window size of 400k is not new, gpt-5, 5.1, 5-mini, etc. have the same. But they do claim they improved long context performance which if true would be great.
replies(1): >>46238435 #
5. energy123 ◴[] No.46238435[source]
But 400k was never usable in ChatGPT Plus/Pro subscriptions. It was nerfed down to 60-100k. If you submitted too long of a prompt they deleted the tokens on the end of your prompt before calling the model. Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

Can someone with an active sub check whether we can submit a full 400k prompt (or at least 200k) and there is no prompt truncatation in the backend? I don't mean attaching a file which uses RAG.

replies(3): >>46238928 #>>46239097 #>>46240022 #
6. tekacs ◴[] No.46238482[source]
I found the same with Max xhigh. To the point that I switched back to just 5.1 High from 5.1 Codex Max. Maybe I should’ve tried Max high first.
7. dudeinhawaii ◴[] No.46238491[source]
This is one of those areas where I think it's about the complexity of the task. What I mean is, if you set codex to xhigh by default, you're wasting compute. IF you're setting it at xhigh when troubleshooting a complex memory bug or something, you're presumably more likely to get a quality response.

I think in general, medium ends up being the best all-purpose setting while high+ are good for single task deep-drive. Or at least that has been my experience so far. You can theoretically let with work longer on a harder task as well.

A lot appears to depend on the problem and problem domain unfortunately.

I've used max in problem sets as diverse as "troubleshooting Cyberpunk mods" and figuring out a race condition in a server backend. In those cases, it did a pretty good job of exhausting available data (finding all available logs, digging into lua files), and narrowing a bug that every other model failed to get.

I guess in some sense you have to know from the onset that it's a "hard problem". That in and of itself is subjective.

replies(1): >>46240702 #
8. robotswantdata ◴[] No.46238659[source]
For a few weeks the Codex model has been cursed. Recommend sticking with 5.1 high , 5.2 feels good too but early days
9. mmaunder ◴[] No.46238825[source]
My name is Mark Maunder. Not the fisheries expert. The other one when you google me. I’m 51 and as skeptical as you when it comes to tech. I’m the CTO of a well known cybersecurity company and merely a user of AI.

Since you critiqued my post, allow me to reciprocate: I sense the same deflector shields in you as many others here. I’d suggest embracing these products with a sense of optimism until proven otherwise and I’ve found that path leads to some amazing discoveries and moments where you realize how important and exciting this tech really is. Try out math that is too hard for you or programming languages that are labor intensive or languages that you don’t know. As the GitHub CEO said: this technology lets you increase your ambition.

replies(4): >>46239436 #>>46239796 #>>46239880 #>>46240279 #
10. gunalx ◴[] No.46238928{3}[source]
API use was not merged in this way.
11. piskov ◴[] No.46239097{3}[source]
Context windows for web

Fast (GPT‑5.2 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K

Thinking (GPT‑5.2 Thinking) All paid tiers: 196K

https://help.openai.com/en/articles/11909943-gpt-52-in-chatg...

replies(2): >>46239517 #>>46240258 #
12. bgwalter ◴[] No.46239436{3}[source]
I have tried the models and in domains I know well they are pathetic. They remove all nuance, make errors that non-experts do not notice and generally produce horrible code.

It is even worse in non-programming domains, where they chop up 100 websites and serve you incorrect bland slop.

If you are using them as a search helper, that sometimes works, though 2010 Google produced better results.

Oracle dropped 11% today due to over-investment in OpenAI. Non-programmers are acutely aware of what is going on.

replies(4): >>46239781 #>>46239962 #>>46240292 #>>46241348 #
13. dr_dshiv ◴[] No.46239517{4}[source]
That’s… too bad
14. jfreds ◴[] No.46239781{4}[source]
> they remove all nuance

Said in a sweeping generalization with zero sense of irony :D

replies(1): >>46241229 #
15. bluefirebrand ◴[] No.46239796{3}[source]
[flagged]
replies(1): >>46240042 #
16. GolfPopper ◴[] No.46239880{3}[source]
Replace 'products' with 'message', 'tech' with 'religion' and 'CEO' with 'prophet' and you have a bog-standard cult recruitment pitch.
replies(1): >>46240297 #
17. what-the-grump ◴[] No.46239962{4}[source]
You pretend that humans don’t produce slop?

I can recognize the short comings of AI code but it can produce a mock or a full blown class before I can find a place to save the file it produced.

Pretending that we are all busy writing novelty and genius is silly, 99% are writing for CRUD tasks and basic business flows, the code isn’t going to be perfect it doesn’t need to be but it will get the job done.

All the logical gotchas of the work flows that you’d be refactoring for hours are done in minutes.

Use pro with search… are it going to read 200 pages of documentation in 7 minutes come up with a conclusion and validate it or invalidate it in another 5? No you still trying accept the cookie prompt on your 6th result.

You might as well join the flat earth society if you still think that AI can’t help you complete day to day tasks.

18. eru ◴[] No.46240022{3}[source]
> Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

I can believe that, but it also seems really silly? If your max context window is X and the chat has approached that, instead of outright deleting the first messages outright, why not have your model summarise the first quarter of tokens and place those at the beginning of the log you feed as context? Since the chat history is (mostly) immutable, this only adds a minimal overhead: you can cache the summarisation, and don't have to do that over and over again for each new message. (If partially summarised log gets too long, you summarise again.)

Since I can come up with this technique in half a minute of thinking about the problem, and the OpenAI folks are presumably not stupid, I wonder what downside I'm missing.

replies(1): >>46240282 #
19. eru ◴[] No.46240042{4}[source]
Maybe you are holding it wrong?

Contemporary LLMs still have huge limitations and downsides. Just like hammer or a saw has limitations. But millions of people are getting good value out of them already (both LLMs and hammers and saws). I find it hard to believe that they are all deluded.

replies(1): >>46240455 #
20. energy123 ◴[] No.46240258{4}[source]
But can you do that in one message or is that a best case scenario in a long multi turn chat?
21. Aeolun ◴[] No.46240282{4}[source]
Don’t think you are missing anything. I do this with the API, and it works great. I’m not sure why they don’t do it, but I can only guess it’s because it completely breaks the context caching. If you summarize the full buffer at least you know you are down to a few thousand tokens to cache again, instead of 100k tokens to cache again.
replies(1): >>46242044 #
22. muppetman ◴[] No.46240292{4}[source]
Exactly this. It's like reading the news! It seems perfectly fine until a news article in a domain you have intimate knowledge of, and then you realise how bad/hacked together the news is. AI feels just like that. But AI can improve, so I'm in the middle with my optimism.
23. Aeolun ◴[] No.46240297{4}[source]
Because most recruitment pitches are the same regardless of the subject.
24. tgtweak ◴[] No.46240408[source]
have been on 1M context window with claude since 4.0 - it gets pretty expensive when you run 1M context on a long running project (mostly using it in cline for coding). I think they've realized more context length = more $ when dealing with most agentic coding workflows on api.
replies(1): >>46240628 #
25. skydhash ◴[] No.46240455{5}[source]
What limitations does an hammer have if the job is hammering? Or a saw with sawing? Even `ed` doesn't have any issue with editing text files.
replies(1): >>46242031 #
26. Workaccount2 ◴[] No.46240628[source]
You should be doing everything you can to keep context under 200k, ideally even 100k. All the models unwind so badly as context grows.
replies(1): >>46241307 #
27. mmaunder ◴[] No.46240633{4}[source]
That's like telling a pig to become a pork producer.
28. wahnfrieden ◴[] No.46240702{3}[source]
You should also be making handoffs to/from Pro
29. Suppafly ◴[] No.46240891[source]
>Can I just say !!!!!!!! Hell yeah!

...

>THANK YOU!!

Man you're way too excited.

30. nathants ◴[] No.46241079[source]
Usable input limit has not changed, and remains 400 - 128 = 272. Confirmed by looking for any changes in codex cli source, nope.
31. jrflowers ◴[] No.46241229{5}[source]
This is a good point. It is a sweeping generalization if you do not read the sentence that comes before that quote
32. patates ◴[] No.46241307{3}[source]
I don't have that experience with gemini. Up to 90% full, it's just fine.
33. re-thc ◴[] No.46241348{4}[source]
> Oracle dropped 11% today due to over-investment in OpenAI

Not even remotely true. Oracle is building out infrastructure mostly for AI workloads. It dropped because it couldn’t explain its financing and if the investment was worth it. OpenAI or not wouldn’t have mattered.

34. lhl ◴[] No.46241471[source]
Anecdotally, I will say that for my toughest jobs GPT-5+ High in `codex` has been the best tool I've used - CUDA->HIP porting, finding bugs in torch, websockets, etc, it's able to test, reason deeply and find bugs. It can't make UI code for it's life however.

Sonnet/Opus 4.5 is faster, generally feels like a better coder, and make much prettier TUI/FEs, but in my experience, for anything tough any time it tells you it understands now, it really doesn't...

Gemini 3 Pro is unusable - I've found the same thing, opinionated in the worst way, unreliable, doesn't respect my AGENTS.md and for my real world problems, I don't think it's actually solved anything that I can't get through w/ GPT (although I'll say that I wasn't impressed w/ Max, hopefully 5.2 xhigh improves things). I've heard it can do some magic from colleagues working on FE, but I'll just have to take their word for it.

35. ubutler ◴[] No.46241483[source]
> Weirdly, the blog announcement completely omits the actual new context window size which is 400,000: https://platform.openai.com/docs/models/gpt-5.2

As @lopuhin points out, they already claimed that context window for previous iterations of GPT-5.

The funny thing is though, I'm on the business plan, and none of their models, not GPT-5, GPT-5.1, GPT-5.2, GPT-5.2 Extended Thinking, GPT-5.2 Pro, etc., can really handle inputs beyond ~50k tokens.

I know because, when working with a really long Python file (>5k LoCs), it often claims there is a bug because, somewhere close to the end of the file, it cuts off and reads as '...'.

Gemini 3 Pro, by contrast, can genuinely handle long contexts.

36. eru ◴[] No.46242031{6}[source]
Well, ask the people who invented better hammers or better saws. Or better text editors than ed.
37. eru ◴[] No.46242044{5}[source]
> [...] but I can only guess it’s because it completely breaks the context caching.

Yes, but you only re-do this every once in a while? It's a constant factor overhead. If you essentially feed the last few thousand tokens, you have no caching at all (and you are big enough that this window of 'last few thousand tokens' doesn't get you the whole conversation)?