(github.com)

1311 points msoad | 5 comments | 31 Mar 23 20:37 UTC | HN request time: 0s | source

Show context

Does that also mean 6GB VRAM?

1. sp332 ◴[31 Mar 23 20:50 UTC] No.35393450[source]▶

According to https://mobile.twitter.com/JustineTunney/status/164190201019... you can probably use the conversion tools from the repo on Alpaca and get the same result.

If you want to run larger Alpaca models on a low VRAM GPU, try FlexGen. I think https://github.com/oobabooga/text-generation-webui/ is one of the easier ways to get that going.

Yeah, or deepspeed presumably. Maybe torch.compile too.

I dunno why I thought llama.cpp would support gpus. shrug

replies(1): >>35395707 #

3. sp332 ◴[01 Apr 23 00:33 UTC] No.35395707[source]▶

Lots of C++ programs use the GPU. It's irrelevant.

I think those specific Alpaca models are all in safetensor now and there isn't simple converter to ggml.

5. sp332 ◴[01 Apr 23 05:16 UTC] No.35397363[source]▶

Late edit: Deep Speed, not FlexGen. I don't know if FG could work, but that repo only supports it for Opt models.

Llama.cpp 30B runs with only 6GB of RAM now