(github.com)

1311 points msoad | 1 comments | 31 Mar 23 20:37 UTC | HN request time: 0.378s | source

Show context

TaylorAlexander ◴[31 Mar 23 21:43 UTC] No.35394064[source]▶

Great to see this advancing! I’m curious if anyone knows what the best repo is for running this stuff on an Nvidia GPU with 16GB vram. I ran the official repo with the leaked weights and the best I could run was the 7B parameter model. I’m curious if people have found ways to fit the larger models on such a system.

replies(2): >>35394117 #>>35394765 #

1. terafo ◴[31 Mar 23 21:48 UTC] No.35394117[source]▶

>>35394064 #

I'd assume that 33B model should fit with this(only repo that I know of that implements SparseGPT and GPTQ for LLaMa), I, personally, haven't tried though. But you can try your luck https://github.com/lachlansneff/sparsellama

↑

Llama.cpp 30B runs with only 6GB of RAM now