> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.
I mean, it really shouldn't take tens of seconds for those initialization(s) to occur. There's no good fundamental reason that it should take that long. It's just bloat.