(www.tomshardware.com)

521 points hd4 | 1 comments | 20 Oct 25 12:31 UTC | HN request time: 0.2s | source

Paper: https://dl.acm.org/doi/10.1145/3731569.3764815

1. nickysielicki ◴[21 Oct 25 15:21 UTC] No.45656934[source]▶

> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.

I mean, it really shouldn't take tens of seconds for those initialization(s) to occur. There's no good fundamental reason that it should take that long. It's just bloat.

↑

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system