Qwen2.5-VL-32B: Smarter and Lighter

(qwenlm.github.io)

544 points tosh | 1 comments | 24 Mar 25 18:35 UTC | HN request time: 0.201s | source

Show context

simonw ◴[24 Mar 25 18:53 UTC] No.43464243[source]▶

32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).

replies(9): >>43464289 #>>43464380 #>>43464443 #>>43464588 #>>43464688 #>>43467991 #>>43468940 #>>43469099 #>>43470619 #

faizshah ◴[24 Mar 25 19:41 UTC] No.43464688[source]▶

>>43464243 #

I just started self hosting as well on my local machine, been using https://lmstudio.ai/ Locally for now.

I think the 32b models are actually good enough that I might stop paying for ChatGPT plus and Claude.

I get around 20 tok/second on my m3 and I can get 100 tok/second on smaller models or quantized. 80-100 tok/second is the best for interactive usage if you go above that you basically can’t read as fast as it generates.

I also really like the QwQ reaoning model, I haven’t gotten around to try out using locally hosted models for Agents and RAG especially coding agents is what im interested in. I feel like 20 tok/second is fine if it’s just running in the background.

Anyways would love to know others experiences, that was mine this weekend. The way it’s going I really dont see a point in paying, I think on-device is the near future and they should just charge a licensing fee like DB provider for enterprise support and updates.

If you were paying $20/mo for ChatGPT 1 year ago, the 32b models are basically at that level but slightly slower and slightly lower quality but useful enough to consider cancelling your subscriptions at this point.

replies(3): >>43464710 #>>43465059 #>>43470007 #

wetwater ◴[24 Mar 25 19:44 UTC] No.43464710[source]▶

>>43464688 #

Are there any good sources that I can read up on estimiating what would be hardware specs required for 7B, 13B, 32B .. etc size If I need to run them locally? I am grad student on budget but I want to host one locally and trying to build a PC that could run one of these models.

replies(6): >>43464785 #>>43464973 #>>43464999 #>>43465270 #>>43465970 #>>43468258 #

1. disgruntledphd2 ◴[24 Mar 25 19:54 UTC] No.43464785[source]▶

>>43464710 #

MacBook with 64gb RAM will probably be the easiest. As a bonus, you can train pytorch models on the built in GPU.

It's really frustrating that I can't just write off Apple as evil monopolists when they put out hardware like this.

↑