(www.tomshardware.com)

521 points hd4 | 2 comments | 20 Oct 25 12:31 UTC | HN request time: 0.413s | source

Paper: https://dl.acm.org/doi/10.1145/3731569.3764815

1. checker659 ◴[20 Oct 25 16:39 UTC] No.45645894[source]▶

>>45643163 (OP) #

They are working with tiny models. Not sure how well it'd scale to bigger models (if at all).

replies(1): >>45646108 #

2. CaptainOfCoit ◴[20 Oct 25 16:51 UTC] No.45646108[source]▶

>>45645894 (TP) #

They're all LLMs, so no, not tiny, but not exactly huge either:

> Our current deployment runs in a cross-region cluster comprising 213 H20 GPUs, serving twenty-eight 1.8–7B models (TP=1) and nineteen 32–72B models (TP=4).

↑

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system