←back to thread

521 points hd4 | 1 comments | | HN request time: 0s | source
Show context
kilotaras ◴[] No.45644776[source]
Alibaba Cloud claims to reduce Nvidia GPU used for serving unpopular models by 82% (emphasis mine)

> 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found

Instead of 1192 GPUs they now use 213 for serving those requests.

replies(5): >>45645037 #>>45647752 #>>45647863 #>>45651559 #>>45653363 #
bee_rider ◴[] No.45647863[source]
I’m slightly confuse as to how all this works. Do the GPUs just sit there with the models on them when the models are not in use?

I guess I’d assumed this sort of thing would be allocated dynamically. Of course, there’s a benefit to minimizing the number of times you load a model. But surely if a GPU+model is idle for more than a couple minutes it could be freed?

(I’m not an AI guy, though—actually I’m used to asking SLURM for new nodes with every run I do!)

replies(6): >>45648058 #>>45648291 #>>45648653 #>>45649219 #>>45650208 #>>45653517 #
1. smallnix ◴[] No.45648291[source]
> I guess I’d assumed this sort of thing would be allocated dynamically

At the scale of a hyperscaler I think Alibaba is the one that would be doing that. AWS, Azure and I assume Alibaba do lease/rent data centers, but someone has to own the servers / GPU racks. I know there are specialized companies like nscale (and more further down the chain) in the mix, but I always assumed they only lease out fixed capacity.