(lmsys.org)

281 points GabrielBianconi | 1 comments | 29 Aug 25 14:07 UTC | HN request time: 0.208s | source

1. s46dxc5r7tv8 ◴[29 Aug 25 15:31 UTC] No.45065424[source]▶

Separation of the prefill and decoding layers with sglang is quite nifty! Normally 8xH100 would barely be able to hold the 4bit quantization of the model without even considering the KV cache. One prefill node for 3 decode nodes is also fascinating, nice writeup.

↑

Deploying DeepSeek on 96 H100 GPUs