(qwenlm.github.io)

544 points tosh | 4 comments | 24 Mar 25 18:35 UTC | HN request time: 0s | source

Show context

simonw ◴[24 Mar 25 18:53 UTC] No.43464243[source]▶

32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).

replies(9): >>43464289 #>>43464380 #>>43464443 #>>43464588 #>>43464688 #>>43467991 #>>43468940 #>>43469099 #>>43470619 #

wetwater ◴[24 Mar 25 19:31 UTC] No.43464588[source]▶

>>43464243 #

I've only recently started looking into running these models locally on my system. I have limited knowledge regarding LLMs and even more limited when it comes to building my own PC.

Are there any good sources that I can read up on estimiating what would be hardware specs required for 7B, 13B, 32B .. etc size If I need to run them locally?

replies(1): >>43464644 #

1. TechDebtDevin ◴[24 Mar 25 19:36 UTC] No.43464644[source]▶

>>43464588 #

VRAM Required = Number of Parameters (in billions) × Number of Bytes per Parameter × Overhead[0].

[0]: https://twm.me/posts/calculate-vram-requirements-local-llms/

replies(2): >>43464872 #>>43465021 #

2. wetwater ◴[24 Mar 25 20:04 UTC] No.43464872[source]▶

>>43464644 (TP) #

Thats neat! thanks

3. manmal ◴[24 Mar 25 20:19 UTC] No.43465021[source]▶

>>43464644 (TP) #

Don’t forget to add a lot of extra space if you want a usable context size.

replies(1): >>43465898 #

4. TechDebtDevin ◴[24 Mar 25 22:05 UTC] No.43465898[source]▶

>>43465021 #

Wouldn't that be your overhead var

↑

Qwen2.5-VL-32B: Smarter and Lighter