←back to thread

602 points emrah | 1 comments | | HN request time: 0.214s | source
Show context
miki123211 ◴[] No.43744691[source]
What would be the best way to deploy this if you're maximizing for GPU utilization in a multi-user (API) scenario? Structured output support would be a big plus.

We're working with a GPU-poor organization with very strict data residency requirements, and these models might be exactly what we need.

I would normally say VLLM, but the blog post notably does not mention VLLM support.

replies(1): >>43747210 #
1. PhilippGille ◴[] No.43747210[source]
vLLM lists Gemma 3 as supported, if I'm not mistaken: https://docs.vllm.ai/en/latest/models/supported_models.html#...