This looks fun. The author mentions machine learning workloads. What are typical machine learning use cases for a cluster of lower end GPUs?
While on that topic, why must large model inferencing be done on a single large GPU and/or bank of memory rather than a cluster of them? Is there promise of being able to eventually run large models on clusters of weaker GPUs?
replies(2):