The hardware for a local model would cost years and years of a $20/mo subscription, would output lower quality work, and would be much slower.
3.7 Thinking is an insane programming model. Maybe it cannot do an SWE's job, but it sure as hell can write functional narrow-scope programs with a GUI.
That name alone holds the most mindshare in it's product category, and is close to the level of name recognition just like Google.
That being said, they have a user base and integrations. As long as they stay close or a bit ahead of the Chinese models they'll be fine. If the Chinese models significantly jumps ahead of them, well, then they are pretty much dead. Add open source to the mix and they become history.
In reality OpenAI is loosing money per user.
Cost per token is tanking like crazy due to competition.
They guesstimate break even and then profit in couple of years.
Their guesses seem to not account for progress much especially on open weight models.
Frankly I have no idea what they're thinking there – they can barely keep up with investor subsidized, non sustainable model.
For small guys and everyone else.. it'll probably be cost neutral to keep paying OpenAi, Google etc directly rather than paying some cloud provider to host an at best on-par model at equivalent prices.
Local hosting on GPU only really makes sense if you're doing many hours of training/inference daily.
If I try other models, I basically end up with a very bad version of AI. Even if I'm someone who uses Anthropic APIs a lot, it's absolutely not worth it to try and self host it. The APIs are much better and you get much cheaper results.
Self hosting for AI might be useful for 0.001% of people honestly.
Also "many hours of inference daily" may mean you're doing your usual stuff daily while running some processing in the background that takes hours/days or you've put together some reactive automation that runs often all the time.
ps. local training rarely makes sense.
ps. 2. not sure where you got 50x slower from; 4090 is actually faster than A100 for example and 5090 is ~75% faster than 4090