Are OpenAI and Anthropic losing money on inference?

(martinalderson.com)

Show context

_sword ◴[28 Aug 25 17:53 UTC] No.45055003[source]▶

I've done the modeling on this a few times and I always get to a place where inference can run at 50%+ gross margins, depending mostly on GPU depreciation and how good the host is at optimizing utilization. The challenge for the margins is whether or not you consider model training costs as part of the calculation. If model training isn't capitalized + amortized, margins are great. If they are amortized and need to be considered... yikes

replies(7): >>45055030 #>>45055275 #>>45055536 #>>45055820 #>>45055835 #>>45056242 #>>45056523 #

BlindEyeHalo ◴[28 Aug 25 18:19 UTC] No.45055275[source]▶

>>45055003 #

Why wouldn't you factor in training? It is not like you can train once and then have the model run for years. You need to constantly improve to keep up with the competition. The lifespan of a model is just a few months at this point.

replies(7): >>45055303 #>>45055495 #>>45055624 #>>45055631 #>>45056110 #>>45056973 #>>45057517 #

1. vonneumannstan ◴[28 Aug 25 18:50 UTC] No.45055624[source]▶

>>45055275 #

I suspect we've already reached the point with models at the GPT5 tier where the average person will no longer recognize improvements and this model can be slightly improved at slow intervals and indeed run for years. Meanwhile research grade models will still need to be trained at massive cost to improve performance on relatively short time scales.

replies(4): >>45055819 #>>45056941 #>>45059324 #>>45059712 #

2. AJ007 ◴[28 Aug 25 19:07 UTC] No.45055819[source]▶

>>45055624 (TP) #

Whenever someone has complained to me about issues they are having with ChatGPT on a particular question or type of question, the first thing I do is ask them what model they are using. So far, no one has ever known offhand what model they were using, nor were not aware there are more models!

If you understand there are multiple models from multiple providers, some of those models are better at certain things than others, and how you can get those models to complete your tasks, you are in the top 1% (probably less) of LLM users.

replies(2): >>45056421 #>>45057385 #

3. ◴[28 Aug 25 20:02 UTC] No.45056421[source]▶

>>45055819 #

4. black_knight ◴[28 Aug 25 20:55 UTC] No.45056941[source]▶

>>45055624 (TP) #

Strangely, I feel GPT-5 as the opposite of an improvement over the previous models, and consider just using Claude for actual work. Also the voice mode went from really useful to useless “Absolutely, I will keep it brief and give it to you directly. …some wrong annswer… And there you have it! As simple as that!”

replies(1): >>45057339 #

5. vonneumannstan ◴[28 Aug 25 21:38 UTC] No.45057339[source]▶

>>45056941 #

>Strangely, I feel GPT-5 as the opposite of an improvement over the previous models

This is almost surely wrong but my point was about GPT5 level models in general not GPT5 specifically...

6. th0ma5 ◴[28 Aug 25 21:42 UTC] No.45057385[source]▶

>>45055819 #

This would be helpful if there was some kind of first principle at which to gauge that better or worse comparison but there isn't outside of people's value judgements like what you're offering.

7. felipeerias ◴[29 Aug 25 02:18 UTC] No.45059324[source]▶

>>45055624 (TP) #

The "Pro" variant of GTP-5 is probably the best model around and most people are not even aware that it exists. One reason is that as models get more capable, they also get a lot more expensive to run so this "Pro" is only available at the $200/month pro plan.

At the same time, more capable models are also a lot more expensive to train.

The key point is that the relationship between all these magnitudes is not linear, so the economics of the whole thing start to look wobbly.

Soon we will probably arrive at a point where these huge training runs must stop, because the performance improvement does not match the huge cost increase, and because the resulting model would be so expensive to run that the market for it would be too small.

replies(1): >>45064826 #

8. ewoodrich ◴[29 Aug 25 03:10 UTC] No.45059712[source]▶

>>45055624 (TP) #

I may not qualify as an "average user" but I shudder imagining being stuck using a 1+ yr stale model for development given my experiences using a newer framework than what was available during training.

Passing in docs usually helps, but I've had some incredibly aggravating experiences where a model just absolutely cannot accept their "mental mode" is incorrect and that they need to forget the tens of thousands of lines of out of date example code they've ingested during training. IMO it's an under-discussed aspect of the current effectiveness of LLM development thanks to the training arms race.

I recently had to fight Gemini to accept that a library (a Google developed AI library for JS, somewhat ironically) had just released a major version update with a lot of API changes that invalidated 99% of the docs and example code online. And boy was there a lot of old code floating around thanks to the vast amounts of SEO blog spam for anything AI adjacent.

replies(1): >>45064786 #

9. vonneumannstan ◴[29 Aug 25 14:42 UTC] No.45064786[source]▶

>>45059712 #

>Passing in docs usually helps, but I've had some incredibly aggravating experiences where a model just absolutely cannot accept their "mental mode" is incorrect and that they need to forget the tens of thousands of lines of out of date example code they've ingested during training. IMO it's an under-discussed aspect of the current effectiveness of LLM development thanks to the training arms race.

I think you overestimate the amount of code turnover in 6-12 months...

10. vonneumannstan ◴[29 Aug 25 14:45 UTC] No.45064826[source]▶

>>45059324 #

>Soon we will probably arrive at a point where these huge training runs must stop, because the performance improvement does not match the huge cost increase, and because the resulting model would be so expensive to run that the market for it would be too small.

I think we're a lot more likely to get to the limit of power and compute available for training a bigger model before we get to the point where improvement stops.

↑