←back to thread

281 points GabrielBianconi | 2 comments | | HN request time: 0s | source
Show context
caminanteblanco ◴[] No.45065331[source]
There was some tangentially related discussion in this post: https://news.ycombinator.com/item?id=45050415, but this cost analysis answers so many questions, and gives me a better idea of how huge the margin on inference a lot of these providers could be taking. Plus I'm sure that Google or OpenAI can get more favorable data center rates than the average Joe Scmoe.

A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr. With 188 million input tokens/hr and 80 million output tokens/hr, that comes out to around $2/million input tokens, and $4.70/million output tokens.

This is actually a lot more than Deepseek r1's rates of $0.10-$0.60/million input and $2/million output, but I'm sure major providers are not paying AWS p5 on-demand pricing.

Edit: those figures were per node, so the actual input and output prices would be divided by 12.$0.17/million input tokens, and $0.39/million output

replies(6): >>45065474 #>>45065821 #>>45065830 #>>45065838 #>>45065925 #>>45067796 #
zipy124 ◴[] No.45065925[source]
AWS is absolutely not cheap, and never has been. You want to look for the hetzner of the GPU world like runpod.io where they are $2 an hour, so $16/hr for 8, that's already half of aws. You can also get a volume discount if you're looking for 96 almost certainly.

An H100 costs about $32k, amortized over 3-5 years gives $1.21 to $0.7 per hour, so adding in electricity costs and cpu/ram etc... runpod.io is running much closer to the actual cost compared to AWS.

replies(2): >>45071097 #>>45071121 #
1. mountainriver ◴[] No.45071121[source]
Runpods network is the worst I’ve ever seen, their infra in general is terrible. It was started by comcast execs, go figure.

Their GPU availability is amazing though

replies(1): >>45071901 #
2. thundergolfer ◴[] No.45071901[source]
Is the network just slow, or just it have outages?