(www.lesswrong.com)

579 points paulpauper | 4 comments | 06 Apr 25 18:01 UTC | HN request time: 0.048s | source

Show context

fxtentacle ◴[06 Apr 25 18:45 UTC] No.43603864[source]▶

I'd say most of the recent AI model progress has been on price.

A 4-bit quant of QwQ-32B is surprisingly close to Claude 3.5 in coding performance. But it's small enough to run on a consumer GPU, which means deployment price is now down to $0.10 per hour. (from $12+ for models requiring 8x H100)

replies(2): >>43603980 #>>43604115 #

1. xiphias2 ◴[06 Apr 25 19:16 UTC] No.43604115[source]▶

>>43603864 #

Have you compared it with 8-bit QwQ-17B?

In my evals 8 bit quantized smaller Qwen models were better, but again evaluating is hard.

replies(1): >>43608343 #

2. redrove ◴[07 Apr 25 06:12 UTC] No.43608343[source]▶

>>43604115 (TP) #

There’s no QwQ 17B that I’m aware of. Do you have a HF link?

replies(1): >>43611902 #

3. xiphias2 ◴[07 Apr 25 14:25 UTC] No.43611902[source]▶

>>43608343 #

You're right, sorry...I just tested Qwen models, not QwQ, I see QwQ only has 32B.

replies(1): >>43611938 #

4. redrove ◴[07 Apr 25 14:27 UTC] No.43611938{3}[source]▶

>>43611902 #

No worries, QwQ is the thinking model from Qwen, it’s a common misconception.

I think they should’ve named it something else.

↑

Recent AI model progress feels mostly like bullshit