Most active commenters

tedsanders(4)

OpenAI o3 and o4-mini

(openai.com)

Show context

_fat_santa ◴[16 Apr 25 17:24 UTC] No.43708027[source]▶

So at this point OpenAI has 6 reasoning models, 4 flagship chat models, and 7 cost optimized models. So that's 17 models in total and that's not even counting their older models and more specialized ones. Compare this with Anthropic that has 7 models in total and 2 main ones that they promote.

This is just getting to be a bit much, seems like they are trying to cover for the fact that they haven't actually done much. All these models feel like they took the exact same base model, tweaked a few things and released it as an entirely new model rather than updating the existing ones. In fact based on some of the other comments here it sounds like these are just updates to their existing model, but they release them as new models to create more media buzz.

replies(22): >>43708044 #>>43708100 #>>43708150 #>>43708219 #>>43708340 #>>43708462 #>>43708605 #>>43708626 #>>43708645 #>>43708647 #>>43708800 #>>43708970 #>>43709059 #>>43709249 #>>43709317 #>>43709652 #>>43709926 #>>43710038 #>>43710114 #>>43710609 #>>43710652 #>>43713438 #

kristofferR ◴[16 Apr 25 17:34 UTC] No.43708150[source]▶

>>43708027 #

To use that criticism for this release ain't really fair, as these will replace the old models (o3 will replace o1, o4-mini will replace o3-mini).

On a more general level - sure, but they aren't planning to use this release to add a larger number of models, it's just that deprecating/killing the old models can't be done overnight.

replies(1): >>43708470 #

1. drcongo ◴[16 Apr 25 17:59 UTC] No.43708470[source]▶

>>43708150 #

As someone who doesn't use anything OpenAI (for all the reasons), I have to agree with the GP. It's all baffling. Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

Once you get to this point you're putting the paradox of choice on the user - I used to use a particular brand toothpaste for years until it got to the point where I'd be in the supermarket looking at a wall of toothpaste all by the same brand with no discernible difference between the products. Why is one of them called "whitening"? Do the others not do that? Why is this one called "complete" and that one called "complete ultra"? That would suggest that the "complete" one wasn't actually complete. I stopped using that brand of toothpaste as it become impossible to know which was the right product within the brand.

If I was assessing the AI landscape today, where the leading models are largely indistinguishable in day to day use, I'd look at OpenAI's wall of toothpaste and immediately discount them.

replies(4): >>43708621 #>>43708737 #>>43708778 #>>43708895 #

2. mkozlows ◴[16 Apr 25 18:14 UTC] No.43708621[source]▶

>>43708470 (TP) #

They keep a lot of models around for backward compatibility for API users. This is confusing, but not inherently a bad idea.

3. petesergeant ◴[16 Apr 25 18:25 UTC] No.43708737[source]▶

>>43708470 (TP) #

> Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

Because if they removed access to o3-mini — which I have tested, costed, and built around — I would be very angry. I will probably switch to o4-mini when the time is right.

replies(1): >>43708953 #

4. louthy ◴[16 Apr 25 18:30 UTC] No.43708778[source]▶

>>43708470 (TP) #

You could develop an AI model to help pick the correct AI model.

Now you’ve got 18 problems.

replies(1): >>43709651 #

5. tedsanders ◴[16 Apr 25 18:42 UTC] No.43708895[source]▶

>>43708470 (TP) #

(I work at OpenAI.)

In ChatGPT, o4-mini is replacing o3-mini. It's a straight 1-to-1 upgrade.

In the API, o4-mini is a new model option. We continue to support o3-mini so that anyone who built a product atop o3-mini can continue to get stable behavior. By offering both, developers can test both and switch when they like. The alternative would be to risk breaking production apps whenever we launch a new model and shut off developers without warning.

I don't think it's too different from what other companies do. Like, consider Apple. They support dozens of iPhone models with their software updates and developer docs. And if you're an app developer, you probably want to be aware of all those models and docs as you develop your app (not an exact analogy). But if you're a regular person and you go into an Apple store, you only see a few options, which you can personalize to what you want.

If you have concrete suggestions on how we can improve our naming or our product offering, happy to consider them. Genuinely trying to do the best we can, and we'll clean some things up later this year.

Fun fact: before GPT-4, we had a unified naming scheme for models that went {modality}-{size}-{version}, which resulted in names like text-davinci-002. We considered launching GPT-4 as something like text-earhart-001, but since everyone was calling it GPT-4 anyway, we abandoned that system to use the name GPT-4 that everyone had already latched onto. Kind of funny how our unified naming scheme originally made room for 999 versions, but we didn't make it past 3.

replies(2): >>43709084 #>>43711162 #

6. TuxSH ◴[16 Apr 25 18:46 UTC] No.43708953[source]▶

>>43708737 #

They just did that, at least for chat

replies(1): >>43715065 #

7. daveguy ◴[16 Apr 25 18:58 UTC] No.43709084[source]▶

>>43708895 #

Have any of the models been deprecated? It seems like a deprecation plan and definition of timelines would be extraordinarily helpful.

I have not seen any sort of "If you're using X.122, upgrade to X.123, before 202X. If you're using X.120, upgrade to anything before April 2026, because the model will no longer be available on that date." ... Like all operating systems and hardware manufacturers have been doing for decades.

Side note, it's amusing that stable behavior is only available on a particular model with a sufficiently low temperature setting. As near-AGI shouldn't these models be smart enough to maintain consistency or improvement from version to version?

replies(1): >>43709273 #

8. tedsanders ◴[16 Apr 25 19:17 UTC] No.43709273{3}[source]▶

>>43709084 #

Yep, we have a page of announced API deprecations here: https://platform.openai.com/docs/deprecations

It's got all deprecations, ordered by date of announcement, alongside shutdown dates and recommended replacements.

Note that we use the term deprecated to mean slated for shutdown, and shutdown to mean when it's actually shut down.

In general, we try to minimize developer pain by supporting models for as long as we reasonably can, and we'll give a long heads up before any shutdown. (GPT-4.5-preview was a bit of an odd case because it was launched as a potentially temporary preview, so we only gave a 3-month notice. But generally we aim for much longer notice.)

replies(1): >>43710698 #

9. skygazer ◴[16 Apr 25 19:55 UTC] No.43709651[source]▶

>>43708778 #

I think you're trying to re-contextualize the old Standards joke, but I actually think you're right -- if a front end model could dispatch as appropriate to the best backend model for a given prompt, and turn everything into a high level sort of mixture of models, I think that would be great, and a great simplifying step. Then they can specialize and optimize all they want, CPU goes down, responses get better and we only see one interface.

replies(2): >>43709744 #>>43710769 #

10. louthy ◴[16 Apr 25 20:03 UTC] No.43709744{3}[source]▶

>>43709651 #

> I think you're trying to re-contextualize the old Standards joke

Regex joke [1], but the standards joke will do just fine also :)

[1] Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

11. meander_water ◴[16 Apr 25 21:39 UTC] No.43710698{4}[source]▶

>>43709273 #

On that page I don't see any mention of o3-mini. Is o3-mini a legacy model now which is slated to be deprecated later on?

replies(1): >>43711733 #

12. calmoo ◴[16 Apr 25 21:47 UTC] No.43710769{3}[source]▶

>>43709651 #

Isn't this basically the idea of agents?

replies(1): >>43746230 #

13. dmd ◴[16 Apr 25 22:46 UTC] No.43711162[source]▶

>>43708895 #

Any idea when v1/models will be updated? As of right now, https://api.openai.com/v1/models has "id": "o3-mini-2025-01-31" and "id": "o3-mini", but no just 'o3'.

replies(1): >>43711913 #

14. tedsanders ◴[17 Apr 25 00:16 UTC] No.43711733{5}[source]▶

>>43710698 #

Nothing announced yet.

Our hypothesis is that o4-mini is a much better model, but we'll wait to hear feedback from developers. Evals only tell part of the story, and we wouldn't want to prematurely deprecate a model that developers continue to find value in. Model behavior is extremely high dimensional, and it's impossible to prevent regression on 100% use cases/prompts, especially if those prompts were originally tuned to the quirks of the older model. But if the majority of developers migrate happily, then it may make sense to deprecate at some future point.

We generally want to give developers as stable as an experience as possible, and not force them to swap models every few months whether they want to or not. Personally, I want developers to spend >99% of their time thinking about their business and <1% of their time thinking about what the OpenAI API is requiring of them.

15. tedsanders ◴[17 Apr 25 00:47 UTC] No.43711913{3}[source]▶

>>43711162 #

Ah, I know this is a pain, but by default o3 is only available to developers on tiers 4–5.

If you're in tiers 1–3, you can still get access - you just need to verify your org with us here:

https://help.openai.com/en/articles/10910291-api-organizatio...

I recognize that verification is annoying, but we eventually had to resort to this as otherwise bad actors will create zillions of accounts to violate our policies and/or avoid paying via credit card fraud/etc.

replies(1): >>43712292 #

16. dmd ◴[17 Apr 25 01:49 UTC] No.43712292{4}[source]▶

>>43711913 #

Aha! Verified and now I see o3. Thanks.

17. petesergeant ◴[17 Apr 25 10:49 UTC] No.43715065{3}[source]▶

>>43708953 #

It seems clear to me I would have built an app around the API, not the chat window.

18. skygazer ◴[20 Apr 25 20:13 UTC] No.43746230{4}[source]▶

>>43710769 #

I don't believe so. I thought agents were go-do-that-complicated-interactive-thing autonomously on my behalf. But, more similar to tool use, except, with mixture of experts, each expert assumes the continuation of "participant identity" in the conversation, in that they're fed the whole context.

replies(1): >>43747222 #

19. calmoo ◴[20 Apr 25 23:10 UTC] No.43747222{5}[source]▶

>>43746230 #

Yeah you're right, I had a misunderstanding of the term.

↑