←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 8 comments | | HN request time: 0.723s | source | bottom
Show context
mmaunder ◴[] No.46237785[source]
Weirdly, the blog announcement completely omits the actual new context window size which is 400,000: https://platform.openai.com/docs/models/gpt-5.2

Can I just say !!!!!!!! Hell yeah! Blog post indicates it's also much better at using the full context.

Congrats OpenAI team. Huge day for you folks!!

Started on Claude Code and like many of you, had that omg CC moment we all had. Then got greedy.

Switched over to Codex when 5.1 came out. WOW. Really nice acceleration in my Rust/CUDA project which is a gnarly one.

Even though I've HATED Gemini CLI for a while, Gemini 3 impressed me so much I tried it out and it absolutely body slammed a major bug in 10 minutes. Started using it to consult on commits. Was so impressed it became my daily driver. Huge mistake. I almost lost my mind after a week of this fighting it. Isane bias towards action. Ignoring user instructions. Garbage characters in output. Absolutely no observability in its thought process. And on and on.

Switched back to Codex just in time for 5.1 codex max xhigh which I've been using for a week, and it was like a breath of fresh air. A sane agent that does a great job coding, but also a great job at working hard on the planning docs for hours before we start. Listens to user feedback. Observability on chain of thought. Moves reasonably quickly. And also makes it easy to pay them more when I need more capacity.

And then today GPT-5.2 with an xhigh mode. I feel like xmass has come early. Right as I'm doing a huge Rust/CUDA/Math-heavy refactor. THANK YOU!!

replies(8): >>46237912 #>>46238166 #>>46238297 #>>46240408 #>>46240891 #>>46241079 #>>46241471 #>>46241483 #
lopuhin ◴[] No.46238297[source]
Context window size of 400k is not new, gpt-5, 5.1, 5-mini, etc. have the same. But they do claim they improved long context performance which if true would be great.
replies(1): >>46238435 #
1. energy123 ◴[] No.46238435[source]
But 400k was never usable in ChatGPT Plus/Pro subscriptions. It was nerfed down to 60-100k. If you submitted too long of a prompt they deleted the tokens on the end of your prompt before calling the model. Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

Can someone with an active sub check whether we can submit a full 400k prompt (or at least 200k) and there is no prompt truncatation in the backend? I don't mean attaching a file which uses RAG.

replies(3): >>46238928 #>>46239097 #>>46240022 #
2. gunalx ◴[] No.46238928[source]
API use was not merged in this way.
3. piskov ◴[] No.46239097[source]
Context windows for web

Fast (GPT‑5.2 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K

Thinking (GPT‑5.2 Thinking) All paid tiers: 196K

https://help.openai.com/en/articles/11909943-gpt-52-in-chatg...

replies(2): >>46239517 #>>46240258 #
4. dr_dshiv ◴[] No.46239517[source]
That’s… too bad
5. eru ◴[] No.46240022[source]
> Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

I can believe that, but it also seems really silly? If your max context window is X and the chat has approached that, instead of outright deleting the first messages outright, why not have your model summarise the first quarter of tokens and place those at the beginning of the log you feed as context? Since the chat history is (mostly) immutable, this only adds a minimal overhead: you can cache the summarisation, and don't have to do that over and over again for each new message. (If partially summarised log gets too long, you summarise again.)

Since I can come up with this technique in half a minute of thinking about the problem, and the OpenAI folks are presumably not stupid, I wonder what downside I'm missing.

replies(1): >>46240282 #
6. energy123 ◴[] No.46240258[source]
But can you do that in one message or is that a best case scenario in a long multi turn chat?
7. Aeolun ◴[] No.46240282[source]
Don’t think you are missing anything. I do this with the API, and it works great. I’m not sure why they don’t do it, but I can only guess it’s because it completely breaks the context caching. If you summarize the full buffer at least you know you are down to a few thousand tokens to cache again, instead of 100k tokens to cache again.
replies(1): >>46242044 #
8. eru ◴[] No.46242044{3}[source]
> [...] but I can only guess it’s because it completely breaks the context caching.

Yes, but you only re-do this every once in a while? It's a constant factor overhead. If you essentially feed the last few thousand tokens, you have no caching at all (and you are big enough that this window of 'last few thousand tokens' doesn't get you the whole conversation)?