Qwen2.5-VL-32B: Smarter and Lighter

(qwenlm.github.io)

544 points tosh | 1 comments | 24 Mar 25 18:35 UTC | HN request time: 0.217s | source

Show context

ggregoire ◴[24 Mar 25 21:11 UTC] No.43465460[source]▶

We were using Llama vision 3.2 a few months back and were very frustrated with it (both in term of speed and results quality). Some day we were looking for alternatives on Hugging Face and eventually stumbled upon Qwen. The difference in accuracy and speed absolutely blew our mind. We ask it to find something in an image and we get a response in like half a second with a 4090 and it's most of the time correct. What's even more mind blowing is that when we ask it to extract any entity name from the image, and the entity name is truncated, it gives us the complete name without even having to ask for it (e.g. "Coca-C" is barely visible in the background, it will return "Coca-Cola" on its own). And it does it with entities not as well known as Coca-Cola, and with entities only known in some very specific regions too. Haven't looked back to Llama or any other vision models since we tried Qwen.

replies(2): >>43469666 #>>43469677 #

Alifatisk ◴[25 Mar 25 10:21 UTC] No.43469666[source]▶

>>43465460 #

Ever since I switched to Qwen as my go to, it's been a bliss. They have a model for many (if not all) cases. No more daily quota! And you get to use their massive context window (1M tokens).

replies(1): >>43470236 #

Hugsun ◴[25 Mar 25 12:06 UTC] No.43470236[source]▶

>>43469666 #

How are you using them? Who is enforcing the daily quota?

replies(1): >>43473612 #

1. Alifatisk ◴[25 Mar 25 17:16 UTC] No.43473612[source]▶

>>43470236 #

I use them through chat.qwenlm.ai, what's nice is that you can run your prompt through 3 different modes in parallel to see which suits the best for that case.

The daily quota I spoke about is chatgpt and claude, those are very limited on the usage (for free users at least, understandable), while on Qwen, I have felt likeI am abusing it with how much I use it. It's very versatile in the sense that it has capabilities like image generation, video generation, massive context window, both visual and textual reasoning all in one place.

Alibaba is really doing something amazing here.

↑