←back to thread

521 points hd4 | 2 comments | | HN request time: 0.436s | source
1. djoldman ◴[] No.45643948[source]
Key paragraph:

> However, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.

replies(1): >>45648070 #
2. make3 ◴[] No.45648070[source]
these other models are likely much smaller