They are working with tiny models. Not sure how well it'd scale to bigger models (if at all).
replies(1):
> Our current deployment runs in a cross-region cluster comprising 213 H20 GPUs, serving twenty-eight 1.8–7B models (TP=1) and nineteen 32–72B models (TP=4).