←back to thread

321 points jhunter1016 | 2 comments | | HN request time: 0s | source
Show context
Roark66 ◴[] No.41878594[source]
>OpenAI plans to loose $5 billion this year

Let that sink in for anyone that has incorporated Chatgpt in their work routines to the point their normal skills start to atrophy. Imagine in 2 years time OpenAI goes bust and MS gets all the IP. Now you can't really do your work without ChatGPT, but it cost has been brought up to how much it really costs to run. Maybe $2k per month per person? And you get about 1h of use per day for the money too...

I've been saying for ages, being a luditite and abstaining from using AI is not the answer (no one is tiling the fields with oxen anymore either). But it is crucial to at the very least retain 50% of capability hosted models like Chatgpt offer locally.

replies(20): >>41878631 #>>41878635 #>>41878683 #>>41878699 #>>41878717 #>>41878719 #>>41878725 #>>41878727 #>>41878813 #>>41878824 #>>41878984 #>>41880860 #>>41880934 #>>41881556 #>>41881938 #>>41882059 #>>41883046 #>>41883088 #>>41883171 #>>41885425 #
sebzim4500 ◴[] No.41878719[source]
The marginal cost of inference per token is lower than what OpenAI charges you (IIRC about 2x cheaper), they make a loss because of the enormous costs of R&D and training new models.
replies(4): >>41878823 #>>41878875 #>>41878927 #>>41879029 #
1. diggan ◴[] No.41878823[source]
Did OpenAI publish concrete numbers regarding this, or where are you getting this data from?
replies(1): >>41881067 #
2. lukeschlather ◴[] No.41881067[source]
https://news.ycombinator.com/item?id=41833287

This says 506 tokens/second for Llama 405B on a machine with 8x H200s which you can rent for $4/GPU so probably $40/hour for a server with enough GPUs. And so it can do ~1.8M tokens per hour. OpenAI charges $10/1M output tokens for GPT4o. (input tokens and cached tokens are cheaper, but this is just ballpark estimates.) So if it were 405B it might cost $20/1M output tokens.

Now, OpenAI is a little vague, but they have implied that GPT4o is actually only 60B-80B parameters. So they're probably selling it with a reasonable profit margin assuming it can do $5/1M output tokens at approximately 100B parameters.

And even if they were selling it at cost, I wouldn't be worried because a couple years from now Nvidia will release H300s that are at least 30% more efficient and that will cause a profit margin to materialize without raising prices. So if I have a use case that works with today's models, I will be able to rent the same thing a year or two from now for roughly the same price.