←back to thread

281 points GabrielBianconi | 1 comments | | HN request time: 0.228s | source
Show context
34679 ◴[] No.45064819[source]
"By deploying this implementation locally, it translates to a cost of $0.20/1M output tokens"

Is that just the cost of electricity, or does it include the cost of the GPUs spread out over their predicted lifetime?

replies(3): >>45064954 #>>45066023 #>>45071720 #
1. zipy124 ◴[] No.45066023[source]
This is all costs included. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour for an h100, which is what they go for on services such as runpod.io . (cheaper if not paying spot-price + volume discounts).