(timkellogg.me)

851 points tkellogg | 1 comments | 05 Feb 25 11:05 UTC | HN request time: 0s | source

Show context

leopoldj ◴[06 Feb 25 14:46 UTC] No.42962843[source]▶

>it can run on my laptop

Has anyone run it on a laptop (unquantized)? Disk size of the 32B model appears to be 80GB. Update: I'm using a 40GB A100 GPU. Loading the model took 30GB vRAM. I asked a simple question "How many r in raspberry". After 5 minutes nothing got generated beyond the prompt. I'm not sure how the author ran this on a laptop.

replies(1): >>42965317 #

coder543 ◴[06 Feb 25 18:53 UTC] No.42965317[source]▶

>>42962843 #

32B models are easy to run on 24GB of RAM at a 4-bit quant.

It sounds like you need to play with some of the existing 32B models with better documentation on how to run them if you're having trouble, but it is entirely plausible to run this on a laptop.

I can run Qwen2.5-Instruct-32B-q4_K_M at 22 tokens per second on just an RTX 3090.

replies(1): >>42966020 #

leopoldj ◴[06 Feb 25 20:09 UTC] No.42966020[source]▶

>>42965317 #

My question was about running it unquantized. The author of the article didn't say how he ran it. If he quantized it then saying he ran it on a laptop is not a news.

replies(2): >>42966043 #>>42967382 #

1. kristianp ◴[06 Feb 25 23:04 UTC] No.42967382{3}[source]▶

>>42966020 #

Maybe he has a 64GB laptop. Also he said he can run it, not that he actually tried it.

↑

S1: A $6 R1 competitor?