←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 2 comments | | HN request time: 0.001s | source
Show context
tabletcorry ◴[] No.46234994[source]
Slight increase in model cost, but looks like benefits across the board to match.

  gpt-5.2 $1.75 $0.175 $14.00
  gpt-5.1 $1.25 $0.125 $10.00
replies(4): >>46235066 #>>46235279 #>>46235579 #>>46235663 #
commandar ◴[] No.46235579[source]
In particular, the API pricing for GPT-5.2 Pro has me wondering what on earth the possible market for that model is beyond getting to claim a couple of percent higher benchmark performance in press releases.

>Input:

>$21.00 / 1M tokens

>Output:

>$168.00 / 1M tokens

That's the most "don't use this" pricing I've seen on a model.

https://openai.com/api/pricing/

replies(6): >>46235702 #>>46235715 #>>46235770 #>>46235776 #>>46235836 #>>46235911 #
aimanbenbaha ◴[] No.46235911[source]
Last year o3 high did 88% on ARC-AGI 1 at more than $4,000/task. This model at its X high configuration scores 90.5% at just $11,64 per task.

General intelligence has ridiculously gotten less expensive. I don't know if it's because of compute and energy abundance,or attention mechanisms improving in efficiency or both but we have to acknowledge the bigger picture and relative prices.

replies(1): >>46236008 #
1. commandar ◴[] No.46236008[source]
Sure, but the reason I'm confused by the pricing is that the pricing doesn't exist in a vacuum.

Pro barely performs better than Thinking in OpenAI's published numbers, but comes at ~10x the price with an explicit disclaimer that it's slow on the order of minutes.

If the published performance numbers are accurate, it seems like it'd be incredibly difficult to justify the premium.

At least on the surface level, it looks like it exists mostly to juice benchmark claims.

replies(1): >>46238709 #
2. rvnx ◴[] No.46238709[source]
It could be using the same early trick of Grok (at least in the earlier versions) that they boot 10 agents who work on the problem in parallel and then get a consensus on the answer. This would explain the price and the latency.

Essentially a newbie trick that works really well but not efficient, but still looking like it's amazing breakthrough.

(if someone knows the actual implementation I'm curious)