←back to thread

579 points paulpauper | 4 comments | | HN request time: 0s | source
Show context
fxtentacle ◴[] No.43603864[source]
I'd say most of the recent AI model progress has been on price.

A 4-bit quant of QwQ-32B is surprisingly close to Claude 3.5 in coding performance. But it's small enough to run on a consumer GPU, which means deployment price is now down to $0.10 per hour. (from $12+ for models requiring 8x H100)

replies(2): >>43603980 #>>43604115 #
1. xiphias2 ◴[] No.43604115[source]
Have you compared it with 8-bit QwQ-17B?

In my evals 8 bit quantized smaller Qwen models were better, but again evaluating is hard.

replies(1): >>43608343 #
2. redrove ◴[] No.43608343[source]
There’s no QwQ 17B that I’m aware of. Do you have a HF link?
replies(1): >>43611902 #
3. xiphias2 ◴[] No.43611902[source]
You're right, sorry...I just tested Qwen models, not QwQ, I see QwQ only has 32B.
replies(1): >>43611938 #
4. redrove ◴[] No.43611938{3}[source]
No worries, QwQ is the thinking model from Qwen, it’s a common misconception.

I think they should’ve named it something else.