(lmsys.org)

281 points GabrielBianconi | 1 comments | 29 Aug 25 14:07 UTC | HN request time: 0.228s | source

Show context

34679 ◴[29 Aug 25 14:44 UTC] No.45064819[source]▶

>>45064329 (OP) #

"By deploying this implementation locally, it translates to a cost of $0.20/1M output tokens"

Is that just the cost of electricity, or does it include the cost of the GPUs spread out over their predicted lifetime?

replies(3): >>45064954 #>>45066023 #>>45071720 #

1. zipy124 ◴[29 Aug 25 16:17 UTC] No.45066023[source]▶

>>45064819 #

This is all costs included. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour for an h100, which is what they go for on services such as runpod.io . (cheaper if not paying spot-price + volume discounts).

↑

Deploying DeepSeek on 96 H100 GPUs