(qwenlm.github.io)

544 points tosh | 2 comments | 24 Mar 25 18:35 UTC | HN request time: 0.002s | source

Show context

simonw ◴[24 Mar 25 18:53 UTC] No.43464243[source]▶

32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).

replies(9): >>43464289 #>>43464380 #>>43464443 #>>43464588 #>>43464688 #>>43467991 #>>43468940 #>>43469099 #>>43470619 #

1. osti ◴[25 Mar 25 03:56 UTC] No.43467991[source]▶

>>43464243 #

Are 5090's able to run 32B models?

replies(1): >>43470031 #

2. regularfry ◴[25 Mar 25 11:37 UTC] No.43470031[source]▶

>>43467991 (TP) #

The 4090 can run 32B models in Q4_K_M, so yes, on that measure. Not unquantised though, nothing bigger than Q8 would fit. On a 32GB card you'll have more choices to trade off quantisation against context.

↑

Qwen2.5-VL-32B: Smarter and Lighter