(qwen.ai)

314 points pretext | 5 comments | 10 Dec 25 16:13 UTC | HN request time: 0.597s | source

1. banjoe ◴[10 Dec 25 17:20 UTC] No.46220493[source]▶

Wow, crushing 2.5 Flash on every benchmark is huge. Time to move all of my LLM workloads to a local GPU rig.

replies(3): >>46220593 #>>46223561 #>>46229791 #

2. embedding-shape ◴[10 Dec 25 17:27 UTC] No.46220593[source]▶

>>46220493 (TP) #

Just remember to benchmark it yourself first with you private task collection, so you can actually measure them against each other. Pretty much any public benchmark is unreliable at this moment, and making model choices based on other's benchmarks is bound to leave you disappointed.

replies(1): >>46222124 #

3. MaxikCZ ◴[10 Dec 25 19:10 UTC] No.46222124[source]▶

>>46220593 #

This. Last benchmarks of DSv3.2spe hinted at beating basically everything, yet in my testing even sonnet is miles ahead both in terms of speed and accuracy

4. red2awn ◴[10 Dec 25 20:45 UTC] No.46223561[source]▶

>>46220493 (TP) #

Why would you use an Omni model for text only workload... There is Qwen3-30B-A3B.

5. skrunch ◴[11 Dec 25 10:41 UTC] No.46229791[source]▶

>>46220493 (TP) #

Except the image benchmarks are compared against 2.0, which seems suspicious that they would casually drop to an older model for those.

↑

Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model