Are OpenAI and Anthropic losing money on inference?

1. moduspol ◴[28 Aug 25 14:45 UTC] No.45052841[source]▶

This kind of presumes you're just cranking out inference non-stop 24/7 to get the estimated price, right? Or am I misreading this?

In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.

I'm also not sure we should expect their costs to just be "in-line with, or cheaper than" what various hourly H100 providers charge. Those providers presumably don't have to run entire datacenters filled to the gills with these specialized GPUs. It may be a lot more expensive to do that than to run a handful of them spread among the same datacenter with your other workloads.

replies(4): >>45053067 #>>45053222 #>>45053374 #>>45053784 #

2. GaggiX ◴[28 Aug 25 15:01 UTC] No.45053067[source]▶

>>45052841 (TP) #

That's why they have the batch tier: https://platform.openai.com/docs/guides/batch

3. lolc ◴[28 Aug 25 15:13 UTC] No.45053222[source]▶

>>45052841 (TP) #

Of course it is impossible for us to know the true cost, but idle instances should not be accounted for at full price:

1. Idle instances don't turn electricity to heat so that reduces their operating cost.

2. Idle instances can be borrowed for training which means flexible training amortizes peak inference capacity.

4. martinald ◴[28 Aug 25 15:24 UTC] No.45053374[source]▶

>>45052841 (TP) #

Yes. But these are on demand prices, so you could just turn them off when loads are less.

But there is no way that OpenAI should be more expensive than this. The main cost is the capex of the H100s, and if you are buying 100k at a time you should be getting a significant discount off list price.

5. empath75 ◴[28 Aug 25 15:56 UTC] No.45053784[source]▶

>>45052841 (TP) #

> In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.

They can repurpose those nodes for training when they aren't being used for inference. Or if they're using public cloud nodes, just turn them off.