Most active commenters
  • caminanteblanco(3)

←back to thread

281 points GabrielBianconi | 13 comments | | HN request time: 0.617s | source | bottom
1. caminanteblanco ◴[] No.45065331[source]
There was some tangentially related discussion in this post: https://news.ycombinator.com/item?id=45050415, but this cost analysis answers so many questions, and gives me a better idea of how huge the margin on inference a lot of these providers could be taking. Plus I'm sure that Google or OpenAI can get more favorable data center rates than the average Joe Scmoe.

A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr. With 188 million input tokens/hr and 80 million output tokens/hr, that comes out to around $2/million input tokens, and $4.70/million output tokens.

This is actually a lot more than Deepseek r1's rates of $0.10-$0.60/million input and $2/million output, but I'm sure major providers are not paying AWS p5 on-demand pricing.

Edit: those figures were per node, so the actual input and output prices would be divided by 12.$0.17/million input tokens, and $0.39/million output

replies(6): >>45065474 #>>45065821 #>>45065830 #>>45065838 #>>45065925 #>>45067796 #
2. ◴[] No.45065474[source]
3. matt-p ◴[] No.45065821[source]
188M input / 80M output tokens per hour was per node I thought?

Reversing out these numbers tells us that they're paying about $2/H100/Hour (or $16/hour for a 8xH100 node).

Disclaimer (one of my sites) https://www.serversearcher.com/servers/gpu - says that a one month commit on a 8XH100 node goes for $12.91/hour. The "I'm buying the servers and putting them in COLO rate" usually works out at around $10/Hour, so there's scope here to reduce the cost by ~30% just by doing better/more committed purchasing.

replies(1): >>45066005 #
4. caminanteblanco ◴[] No.45065830[source]
Ok, so the authors apparently used atlas cloud hosting, which charges $1.80 per h100/hr, which would change the overall cost to around $0.08/ million input and $0.18/million output, which seems much more in line with massive inference margins for major providers.
5. paxys ◴[] No.45065838[source]
According to the post their costs were $0.20/1M output tokens (on cloud GPUs), so your numbers are off somewhere.
6. zipy124 ◴[] No.45065925[source]
AWS is absolutely not cheap, and never has been. You want to look for the hetzner of the GPU world like runpod.io where they are $2 an hour, so $16/hr for 8, that's already half of aws. You can also get a volume discount if you're looking for 96 almost certainly.

An H100 costs about $32k, amortized over 3-5 years gives $1.21 to $0.7 per hour, so adding in electricity costs and cpu/ram etc... runpod.io is running much closer to the actual cost compared to AWS.

replies(2): >>45071097 #>>45071121 #
7. caminanteblanco ◴[] No.45066005[source]
You were definitely right, I updated the original comment. Thanks for your correction!
8. bluedino ◴[] No.45067796[source]
> A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr

And what stinks is that you can't even build a Dell/HPE server like this online. You have to 'request a quote' for an 'AI Server'

Going through SuperMicro, you're looking at about $60k for the server, plus 8 GPU's at $25,000 each, so you're close to $300,000 for an 8 GPU node.

Now, that doesn't include networking, storage, racks, electricity, cooling, someone to set that all up for you, $1,000 DAC cables, NVIDIA middleware, downtime as the H100's are the flakiest pieces of junk ever and will need to be replaced every so often...

Setting up a 96 H100 cluster (12 of those puppies) in this case is probably going to cost you $4-5 million. But it should cost less than AWS after a year and a half.

replies(2): >>45068071 #>>45071538 #
9. Tepix ◴[] No.45068071[source]
I think you can get the server itself quite a bit cheaper than $60k. I found a barebone for around 19400€ at https://www.lambda-tek.de/Supermicro-SYS-821GE-TNHR-sh/B4760...
10. fooker ◴[] No.45071097[source]
H100 was 32k three years ago.

Significantly cheaper now that most cloud providers are buying Blackwell.

11. mountainriver ◴[] No.45071121[source]
Runpods network is the worst I’ve ever seen, their infra in general is terrible. It was started by comcast execs, go figure.

Their GPU availability is amazing though

replies(1): >>45071901 #
12. Spooky23 ◴[] No.45071538[source]
> And what stinks is that you can't even build a Dell/HPE server like this online. You have to 'request a quote' for an 'AI Server'

The hot parts are/were on allocation to both vendors. They try to sus out your use case and redirect you to less constrained parts.

13. thundergolfer ◴[] No.45071901{3}[source]
Is the network just slow, or just it have outages?