GPT-5.2

(openai.com)

1019 points atgctg | 2 comments | 11 Dec 25 18:04 UTC | HN request time: 0.001s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

tabletcorry ◴[11 Dec 25 18:23 UTC] No.46234994[source]▶

Slight increase in model cost, but looks like benefits across the board to match.

  gpt-5.2 $1.75 $0.175 $14.00
  gpt-5.1 $1.25 $0.125 $10.00

replies(4): >>46235066 #>>46235279 #>>46235579 #>>46235663 #

commandar ◴[11 Dec 25 19:01 UTC] No.46235579[source]▶

>>46234994 #

In particular, the API pricing for GPT-5.2 Pro has me wondering what on earth the possible market for that model is beyond getting to claim a couple of percent higher benchmark performance in press releases.

>Input:

>$21.00 / 1M tokens

>Output:

>$168.00 / 1M tokens

That's the most "don't use this" pricing I've seen on a model.

https://openai.com/api/pricing/

replies(6): >>46235702 #>>46235715 #>>46235770 #>>46235776 #>>46235836 #>>46235911 #

aimanbenbaha ◴[11 Dec 25 19:22 UTC] No.46235911[source]▶

>>46235579 #

Last year o3 high did 88% on ARC-AGI 1 at more than $4,000/task. This model at its X high configuration scores 90.5% at just $11,64 per task.

General intelligence has ridiculously gotten less expensive. I don't know if it's because of compute and energy abundance,or attention mechanisms improving in efficiency or both but we have to acknowledge the bigger picture and relative prices.

replies(1): >>46236008 #

1. commandar ◴[11 Dec 25 19:29 UTC] No.46236008[source]▶

>>46235911 #

Sure, but the reason I'm confused by the pricing is that the pricing doesn't exist in a vacuum.

Pro barely performs better than Thinking in OpenAI's published numbers, but comes at ~10x the price with an explicit disclaimer that it's slow on the order of minutes.

If the published performance numbers are accurate, it seems like it'd be incredibly difficult to justify the premium.

At least on the surface level, it looks like it exists mostly to juice benchmark claims.

replies(1): >>46238709 #

2. rvnx ◴[11 Dec 25 23:17 UTC] No.46238709[source]▶

>>46236008 (TP) #

It could be using the same early trick of Grok (at least in the earlier versions) that they boot 10 agents who work on the problem in parallel and then get a consensus on the answer. This would explain the price and the latency.

Essentially a newbie trick that works really well but not efficient, but still looking like it's amazing breakthrough.

(if someone knows the actual implementation I'm curious)

↑