GLM 4.5 with Claude Code

(docs.z.ai)

210 points vincirufus | 1 comments | 06 Sep 25 00:45 UTC | HN request time: 0s | source

Show context

chisleu ◴[06 Sep 25 02:07 UTC] No.45145959[source]▶

I've been using GLM 4.5 and GLM 4.5 Air for a while now. The Air model is light enough to run on a macbook pro and is useful for Cline. I can run the full GLM model on my Mac Studio, but the TPS is so slow that it's only useful for chatting. So I hooked up with openrouter to try but didn't have the same success. Any of the open weight models I try with open router give sub standard results. I get better results from Qwen 3 coder 30b a3b locally than I get from Qwen 3 Coder 480b through open router.

I'm really concerned that some of the providers are using quantized versions of the models so they can run more models per card and larger batches of inference.

replies(3): >>45145970 #>>45146999 #>>45149106 #

vincirufus ◴[06 Sep 25 02:10 UTC] No.45145970[source]▶

>>45145959 #

yeah I too have heard similar concerns with Open models on OpenRouter, but haven't been able to verify it, as I don't use that a lot

replies(1): >>45146140 #

numlocked ◴[06 Sep 25 02:37 UTC] No.45146140[source]▶

>>45145970 #

(OpenRouter COO here) We are starting to test this and verify the deployments. More to come on that front -- but long story short is that we don't have good evidence that providers are doing weird stuff that materially affects model accuracy. If you have data points to the contrary, we would love them.

We are heavily incentivized to prioritize/make transparent high-quality inference and have no incentive to offer quantized/poorly-performing alternatives. We certainly hear plenty of anecdotal reports like this, but when we dig in we generally don't see it.

An exception is when a model is first released -- for example this terrific work by artificial analysis: https://x.com/ArtificialAnlys/status/1955102409044398415

It does take providers time to learn how to run the models in a high quality way; my expectation is that the difference in quality will be (or already is) minimal over time. The large variance in that case was because GPT OSS had only been out for a couple of weeks.

For well-established models, our (admittedly limited) testing has not revealed much variance between providers in terms of quality. There is some but it's not like we see a couple of providers 'cheating' by secretly quantizing and clearly serving less intelligence versions of the model. We're going to get more systematic about it though and perhaps will uncover some surprises.

replies(3): >>45146251 #>>45146473 #>>45147084 #

chandureddyvari ◴[06 Sep 25 03:00 UTC] No.45146251{3}[source]▶

>>45146140 #

Unsolicited advice: Why doesn’t open router provide hosting services for OSS models that guarantee non-quantised versions of the LLMs? Would be a win-win for everyone.

replies(2): >>45146312 #>>45146437 #

1. jatins ◴[06 Sep 25 03:13 UTC] No.45146312{4}[source]▶

>>45146251 #

In fact I thought that's what OpenRouter was hosting them all along

↑