Deploying DeepSeek on 96 H100 GPUs

1. caminanteblanco ◴[29 Aug 25 15:23 UTC] No.45065331[source]▶

There was some tangentially related discussion in this post: https://news.ycombinator.com/item?id=45050415, but this cost analysis answers so many questions, and gives me a better idea of how huge the margin on inference a lot of these providers could be taking. Plus I'm sure that Google or OpenAI can get more favorable data center rates than the average Joe Scmoe.

A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr. With 188 million input tokens/hr and 80 million output tokens/hr, that comes out to around $2/million input tokens, and $4.70/million output tokens.

This is actually a lot more than Deepseek r1's rates of $0.10-$0.60/million input and $2/million output, but I'm sure major providers are not paying AWS p5 on-demand pricing.

Edit: those figures were per node, so the actual input and output prices would be divided by 12.$0.17/million input tokens, and $0.39/million output

replies(6): >>45065474 #>>45065821 #>>45065830 #>>45065838 #>>45065925 #>>45067796 #

2. ◴[29 Aug 25 15:35 UTC] No.45065474[source]▶

>>45065331 (TP) #

3. matt-p ◴[29 Aug 25 16:01 UTC] No.45065821[source]▶

>>45065331 (TP) #

188M input / 80M output tokens per hour was per node I thought?

Reversing out these numbers tells us that they're paying about $2/H100/Hour (or $16/hour for a 8xH100 node).

Disclaimer (one of my sites) https://www.serversearcher.com/servers/gpu - says that a one month commit on a 8XH100 node goes for $12.91/hour. The "I'm buying the servers and putting them in COLO rate" usually works out at around $10/Hour, so there's scope here to reduce the cost by ~30% just by doing better/more committed purchasing.

replies(1): >>45066005 #

4. caminanteblanco ◴[29 Aug 25 16:02 UTC] No.45065830[source]▶

>>45065331 (TP) #

Ok, so the authors apparently used atlas cloud hosting, which charges $1.80 per h100/hr, which would change the overall cost to around $0.08/ million input and $0.18/million output, which seems much more in line with massive inference margins for major providers.

5. paxys ◴[29 Aug 25 16:02 UTC] No.45065838[source]▶

>>45065331 (TP) #

According to the post their costs were $0.20/1M output tokens (on cloud GPUs), so your numbers are off somewhere.

6. zipy124 ◴[29 Aug 25 16:09 UTC] No.45065925[source]▶

>>45065331 (TP) #

AWS is absolutely not cheap, and never has been. You want to look for the hetzner of the GPU world like runpod.io where they are $2 an hour, so $16/hr for 8, that's already half of aws. You can also get a volume discount if you're looking for 96 almost certainly.

An H100 costs about $32k, amortized over 3-5 years gives $1.21 to $0.7 per hour, so adding in electricity costs and cpu/ram etc... runpod.io is running much closer to the actual cost compared to AWS.

replies(2): >>45071097 #>>45071121 #

7. caminanteblanco ◴[29 Aug 25 16:15 UTC] No.45066005[source]▶

>>45065821 #

You were definitely right, I updated the original comment. Thanks for your correction!

8. bluedino ◴[29 Aug 25 18:36 UTC] No.45067796[source]▶

>>45065331 (TP) #

> A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr

And what stinks is that you can't even build a Dell/HPE server like this online. You have to 'request a quote' for an 'AI Server'

Going through SuperMicro, you're looking at about $60k for the server, plus 8 GPU's at $25,000 each, so you're close to $300,000 for an 8 GPU node.

Now, that doesn't include networking, storage, racks, electricity, cooling, someone to set that all up for you, $1,000 DAC cables, NVIDIA middleware, downtime as the H100's are the flakiest pieces of junk ever and will need to be replaced every so often...

Setting up a 96 H100 cluster (12 of those puppies) in this case is probably going to cost you $4-5 million. But it should cost less than AWS after a year and a half.

replies(2): >>45068071 #>>45071538 #

9. Tepix ◴[29 Aug 25 19:03 UTC] No.45068071[source]▶

>>45067796 #

I think you can get the server itself quite a bit cheaper than $60k. I found a barebone for around 19400€ at https://www.lambda-tek.de/Supermicro-SYS-821GE-TNHR-sh/B4760...

10. fooker ◴[30 Aug 25 01:16 UTC] No.45071097[source]▶

>>45065925 #

H100 was 32k three years ago.

Significantly cheaper now that most cloud providers are buying Blackwell.

11. mountainriver ◴[30 Aug 25 01:23 UTC] No.45071121[source]▶

>>45065925 #

Runpods network is the worst I’ve ever seen, their infra in general is terrible. It was started by comcast execs, go figure.

Their GPU availability is amazing though

replies(1): >>45071901 #

12. Spooky23 ◴[30 Aug 25 02:54 UTC] No.45071538[source]▶

>>45067796 #

> And what stinks is that you can't even build a Dell/HPE server like this online. You have to 'request a quote' for an 'AI Server'

The hot parts are/were on allocation to both vendors. They try to sus out your use case and redirect you to less constrained parts.

13. thundergolfer ◴[30 Aug 25 04:20 UTC] No.45071901{3}[source]▶

>>45071121 #

Is the network just slow, or just it have outages?