Based on their S1 filing and public statements, the average cost per WSE system for their (~90% of their total revenue) largest customer is ~$1.36M, and I’ve heard “retail” pricing of $2.5M per system. They are also 15U and due to power and additional support equipment take up an entire rack.
The other thing people don’t seem to be getting in this thread that just to hold the weights for 405B at FP16 requires 19 of their systems since it is SRAM only… rounding up to 20 to account for program code + KV cache for the user context would mean 20 systems/racks, so well over $20M. The full rack (including support equipment) also consumes 23kW, so we are talking nearly half a megawatt and ~$30M for them to be getting this performance on Llama 405B