Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system

(www.tomshardware.com)

521 points hd4 | 1 comments | 20 Oct 25 12:31 UTC | HN request time: 0.368s | source

Paper: https://dl.acm.org/doi/10.1145/3731569.3764815

Show context

kilotaras ◴[20 Oct 25 15:05 UTC] No.45644776[source]▶

Alibaba Cloud claims to reduce Nvidia GPU used for serving unpopular models by 82% (emphasis mine)

> 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found

Instead of 1192 GPUs they now use 213 for serving those requests.

replies(5): >>45645037 #>>45647752 #>>45647863 #>>45651559 #>>45653363 #

hinkley ◴[21 Oct 25 01:43 UTC] No.45651559[source]▶

>>45644776 #

So 82% of 17.7%?

14.5% is worth a raise at least. But it’s still misleading.

replies(1): >>45668991 #

abejfehr ◴[22 Oct 25 13:39 UTC] No.45668991[source]▶

>>45651559 #

I don't think that's what this is saying, isn't it that 100 - ~82 = 17.7% ?

replies(1): >>45673016 #

1. hinkley ◴[22 Oct 25 18:13 UTC] No.45673016[source]▶

>>45668991 #

That is a confusing coincidence, but no.

> Reserving full GPU instances for these models leads to allocating 17.7% of our GPUs to serve only 1.35% of requests

> Deployment results show that Aegaeon reduces the number of GPUs required for serving these models from 1,192 to 213, highlighting an 82% GPU resource saving.

82% of their CPUs were serving 98.6% of all traffic. If they reduced the cluster size, they got it to 96.2% of their CPUs serving 98.6% of their traffic. If they reallocated those, which is more likely, then 96.8% of their CPUs are serving 98.6% of all requests, or around 17% more capacity for popular requests on the same hardware.

↑