(lmsys.org)

281 points GabrielBianconi | 2 comments | 29 Aug 25 14:07 UTC | HN request time: 0.431s | source

1. ozgune ◴[29 Aug 25 16:18 UTC] No.45066036[source]▶

The SGLang Team has a follow-up blog post that talks about DeepSeek inference performance on GB200 NVL72: https://lmsys.org/blog/2025-06-16-gb200-part-1/

Just in case you have $3-4M lying around somewhere for some high quality inference. :)

SGLang quotes a 2.5-3.4x speedup as compared to the H100s. They also note that more optimizations are coming, but they haven't yet published a part 2 on the blog post.

replies(1): >>45074618 #

2. aurareturn ◴[30 Aug 25 13:39 UTC] No.45074618[source]▶

>>45066036 (TP) #

Isn't Blackwell optimized for FP4? This blog post runs Deepseek at fp8, which is probably the sweet spot but new models with fp4 native training and inference would be drastically faster than fp8 on blackwell.

↑

Deploying DeepSeek on 96 H100 GPUs