(github.com)

1311 points msoad | 4 comments | 31 Mar 23 20:37 UTC | HN request time: 0.001s | source

1. TaylorAlexander ◴[31 Mar 23 21:43 UTC] No.35394064[source]▶

Great to see this advancing! I’m curious if anyone knows what the best repo is for running this stuff on an Nvidia GPU with 16GB vram. I ran the official repo with the leaked weights and the best I could run was the 7B parameter model. I’m curious if people have found ways to fit the larger models on such a system.

replies(2): >>35394117 #>>35394765 #

2. terafo ◴[31 Mar 23 21:48 UTC] No.35394117[source]▶

>>35394064 (TP) #

I'd assume that 33B model should fit with this(only repo that I know of that implements SparseGPT and GPTQ for LLaMa), I, personally, haven't tried though. But you can try your luck https://github.com/lachlansneff/sparsellama

3. enlyth ◴[31 Mar 23 22:43 UTC] No.35394765[source]▶

>>35394064 (TP) #

https://github.com/oobabooga/text-generation-webui

replies(1): >>35397820 #

4. TaylorAlexander ◴[01 Apr 23 06:46 UTC] No.35397820[source]▶

>>35394765 #

looks great thank you!

↑

Llama.cpp 30B runs with only 6GB of RAM now