Deploying DeepSeek on 96 H100 GPUs

(lmsys.org)

281 points GabrielBianconi | 2 comments | 29 Aug 25 14:07 UTC | HN request time: 0.414s | source

Show context

caminanteblanco ◴[29 Aug 25 15:23 UTC] No.45065331[source]▶

There was some tangentially related discussion in this post: https://news.ycombinator.com/item?id=45050415, but this cost analysis answers so many questions, and gives me a better idea of how huge the margin on inference a lot of these providers could be taking. Plus I'm sure that Google or OpenAI can get more favorable data center rates than the average Joe Scmoe.

A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr. With 188 million input tokens/hr and 80 million output tokens/hr, that comes out to around $2/million input tokens, and $4.70/million output tokens.

This is actually a lot more than Deepseek r1's rates of $0.10-$0.60/million input and $2/million output, but I'm sure major providers are not paying AWS p5 on-demand pricing.

Edit: those figures were per node, so the actual input and output prices would be divided by 12.$0.17/million input tokens, and $0.39/million output

replies(6): >>45065474 #>>45065821 #>>45065830 #>>45065838 #>>45065925 #>>45067796 #

1. matt-p ◴[29 Aug 25 16:01 UTC] No.45065821[source]▶

>>45065331 #

188M input / 80M output tokens per hour was per node I thought?

Reversing out these numbers tells us that they're paying about $2/H100/Hour (or $16/hour for a 8xH100 node).

Disclaimer (one of my sites) https://www.serversearcher.com/servers/gpu - says that a one month commit on a 8XH100 node goes for $12.91/hour. The "I'm buying the servers and putting them in COLO rate" usually works out at around $10/Hour, so there's scope here to reduce the cost by ~30% just by doing better/more committed purchasing.

replies(1): >>45066005 #

2. caminanteblanco ◴[29 Aug 25 16:15 UTC] No.45066005[source]▶

>>45065821 (TP) #

You were definitely right, I updated the original comment. Thanks for your correction!

↑