Qwen2.5-VL-32B: Smarter and Lighter

(qwenlm.github.io)

544 points tosh | 1 comments | 24 Mar 25 18:35 UTC | HN request time: 0.209s | source

Show context

simonw ◴[24 Mar 25 18:53 UTC] No.43464243[source]▶

32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).

replies(9): >>43464289 #>>43464380 #>>43464443 #>>43464588 #>>43464688 #>>43467991 #>>43468940 #>>43469099 #>>43470619 #

faizshah ◴[24 Mar 25 19:41 UTC] No.43464688[source]▶

>>43464243 #

I just started self hosting as well on my local machine, been using https://lmstudio.ai/ Locally for now.

I think the 32b models are actually good enough that I might stop paying for ChatGPT plus and Claude.

I get around 20 tok/second on my m3 and I can get 100 tok/second on smaller models or quantized. 80-100 tok/second is the best for interactive usage if you go above that you basically can’t read as fast as it generates.

I also really like the QwQ reaoning model, I haven’t gotten around to try out using locally hosted models for Agents and RAG especially coding agents is what im interested in. I feel like 20 tok/second is fine if it’s just running in the background.

Anyways would love to know others experiences, that was mine this weekend. The way it’s going I really dont see a point in paying, I think on-device is the near future and they should just charge a licensing fee like DB provider for enterprise support and updates.

If you were paying $20/mo for ChatGPT 1 year ago, the 32b models are basically at that level but slightly slower and slightly lower quality but useful enough to consider cancelling your subscriptions at this point.

replies(3): >>43464710 #>>43465059 #>>43470007 #

wetwater ◴[24 Mar 25 19:44 UTC] No.43464710[source]▶

>>43464688 #

Are there any good sources that I can read up on estimiating what would be hardware specs required for 7B, 13B, 32B .. etc size If I need to run them locally? I am grad student on budget but I want to host one locally and trying to build a PC that could run one of these models.

replies(6): >>43464785 #>>43464973 #>>43464999 #>>43465270 #>>43465970 #>>43468258 #

1. faizshah ◴[24 Mar 25 20:17 UTC] No.43464999[source]▶

>>43464710 #

Go to r/LocalLLAMA they have the most info. There’s also lots of good YouTube channels who have done benchmarks on Mac minis for this (another good value one with student discount).

Since you’re a student most of the providers/clouds offer student credits and you can also get loads of credits from hackathons.

↑