I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.
I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.
This doesn't match my experience precisely, but I've definitely had cases where some of the providers had consistently worse output for the same model than others, the solution there was to figure out which ones those are and to denylist them in the UI.
As for quantized versions, you can check it for each model and provider, for example: https://openrouter.ai/qwen/qwen3-coder/providers
You can see that these providers run FP4 versions:
* DeepInfra (Turbo)
And these providers run FP8 versions: * Chutes
* GMICloud
* NovitaAI
* Baseten
* Parasail
* Nebius AI Studio
* AtlasCloud
* Targon
* Together
* Hyperbolic
* Cerebras
I will say that it's not all bad and my experience with FP8 output has been pretty decent, especially when I need something done quickly and choose to use Cerebras - provided their service isn't overloaded, their TPS is really, really good.You can also request specific precision on a per request basis: https://openrouter.ai/docs/features/provider-routing#quantiz... (or just make a custom preset)
As for how Qwen3 Coder performs, there's always SWE-bench: https://www.swebench.com/
By the numbers:
* it sits between Gemini 2.5 Pro and GPT-5 mini
* it beats out Kimi K2 and the older Claude Sonnet 3.7
* but loses out to Claude Sonnet 4 and GPT-5
Personally, I find it sufficient for most tasks (from recommendations and questions to as close to vibe coding as I get) on a technical level. GLM 4.5 isn't on the site at the time of writing this, but they should match one another pretty closely. Feeling wise, I still very much prefer Sonnet 4 to everything else, but it's both expensive and way slower than Cerebras (not even close).Update: also seems like the Growth plan on their page says "Starting from 1500 USD / month" which is a bit silly when the new cheapest subscription is 50 USD / month.