←back to thread

281 points GabrielBianconi | 1 comments | | HN request time: 0s | source
Show context
34679 ◴[] No.45064819[source]
"By deploying this implementation locally, it translates to a cost of $0.20/1M output tokens"

Is that just the cost of electricity, or does it include the cost of the GPUs spread out over their predicted lifetime?

replies(3): >>45064954 #>>45066023 #>>45071720 #
dragonslayer56 ◴[] No.45064954[source]
” Our implementation, shown in the figure above, runs on 12 nodes in the Atlas Cloud, each equipped with 8 H100 GPUs.”

Maybe the cost of renting?

replies(2): >>45065147 #>>45065503 #
ollybee ◴[] No.45065503[source]
H100's can be $2 and hour, so $192 an hour for the full cluster. They report 22k tokens per second, so ~ 80 million an hour, thats $16 an hour at $0.2 per million. Maybe a bit more for input tokens, but it seems a long way off.
replies(1): >>45066003 #
1. zipy124 ◴[] No.45066003[source]
I think you mis-read. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour.