(lmsys.org)

281 points GabrielBianconi | 1 comments | 29 Aug 25 14:07 UTC | HN request time: 0s | source

Show context

34679 ◴[29 Aug 25 14:44 UTC] No.45064819[source]▶

>>45064329 (OP) #

"By deploying this implementation locally, it translates to a cost of $0.20/1M output tokens"

Is that just the cost of electricity, or does it include the cost of the GPUs spread out over their predicted lifetime?

replies(3): >>45064954 #>>45066023 #>>45071720 #

dragonslayer56 ◴[29 Aug 25 14:54 UTC] No.45064954[source]▶

>>45064819 #

” Our implementation, shown in the figure above, runs on 12 nodes in the Atlas Cloud, each equipped with 8 H100 GPUs.”

Maybe the cost of renting?

replies(2): >>45065147 #>>45065503 #

ollybee ◴[29 Aug 25 15:38 UTC] No.45065503[source]▶

>>45064954 #

H100's can be $2 and hour, so $192 an hour for the full cluster. They report 22k tokens per second, so ~ 80 million an hour, thats $16 an hour at $0.2 per million. Maybe a bit more for input tokens, but it seems a long way off.

replies(1): >>45066003 #

1. zipy124 ◴[29 Aug 25 16:15 UTC] No.45066003[source]▶

>>45065503 #

I think you mis-read. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour.

↑

Deploying DeepSeek on 96 H100 GPUs