←back to thread

281 points GabrielBianconi | 5 comments | | HN request time: 0.001s | source
Show context
brilee ◴[] No.45065876[source]
For those commenting on cost per token:

This throughput assumes 100% utilizations. A bunch of things raise the cost at scale:

- There are no on-demand GPUs at this scale. You have to rent them for multi-year contracts. So you have to lock in some number of GPUs for your maximum throughput (or some sufficiently high percentile), not your average throughput. Your peak throughput at west coast business hours is probably 2-3x higher than the throughput at tail hours (east coast morning, west coast evenings)

- GPUs are often regionally locked due to data processing issues + latency issues. Thus, it's difficult to utilize these GPUs overnight because Asia doesn't want their data sent to the US and the US doesn't want their data sent to Asia.

These two factors mean that GPU utilization comes in at 10-20%. Now, if you're a massive company that spends a lot of money on training new models, you could conceivably slot in RL inference or model training to happen in these off-peak hours, maximizing utilization.

But for those companies purely specializing in inference, I would _not_ assume that these 90% margins are real. I would guess that even when it seems "10x cheaper", you're only seeing margins of 50%.

replies(7): >>45067585 #>>45067903 #>>45067926 #>>45068175 #>>45068222 #>>45072198 #>>45073200 #
koliber ◴[] No.45072198[source]
These are great points.

However, I don’t think these companies provision capacity for peak usage and let it idle during off peak. I think they provision it at something a bit above average, and aim at 100% utilization for the max number of hours in the day. When there is not enough capacity to meet demand they utilize various service degradation methods and/or load shedding.

replies(1): >>45072278 #
1. mcny ◴[] No.45072278[source]
Is this why I get anthropic/Claude emails every single day since I signed up for their status updates? I just assumed they were working hard with production bugs but in light of this comment, if you don't hit capacity constraints every day, you are wasting money?
replies(2): >>45072607 #>>45073590 #
2. chii ◴[] No.45072607[source]
This is true for all capital equipment - whether it's a GPU, a bore drill, or an earth mover.

You want to make use of it at as close to 100% as possible.

replies(1): >>45072945 #
3. hvb2 ◴[] No.45072945[source]
With the caveat that GPUs depreciate a bit faster obviously. A drill is still a drill next year or a decade from now.
replies(1): >>45073934 #
4. koliber ◴[] No.45073590[source]
Just like at an all-you-can eat buffet.
5. apetrov ◴[] No.45073934{3}[source]
yes, but the capital is still tied to it. you want it to Have a meaningful ROI, not sitting in a warehouse.