←back to thread

1311 points msoad | 5 comments | | HN request time: 0.408s | source
Show context
brucethemoose2 ◴[] No.35393393[source]
Does that also mean 6GB VRAM?

And does that include Alpaca models like this? https://huggingface.co/elinas/alpaca-30b-lora-int4

replies(2): >>35393441 #>>35393450 #
1. sp332 ◴[] No.35393450[source]
According to https://mobile.twitter.com/JustineTunney/status/164190201019... you can probably use the conversion tools from the repo on Alpaca and get the same result.

If you want to run larger Alpaca models on a low VRAM GPU, try FlexGen. I think https://github.com/oobabooga/text-generation-webui/ is one of the easier ways to get that going.

replies(3): >>35393841 #>>35396847 #>>35397363 #
2. brucethemoose2 ◴[] No.35393841[source]
Yeah, or deepspeed presumably. Maybe torch.compile too.

I dunno why I thought llama.cpp would support gpus. shrug

replies(1): >>35395707 #
3. sp332 ◴[] No.35395707[source]
Lots of C++ programs use the GPU. It's irrelevant.
4. KingMachiavelli ◴[] No.35396847[source]
I think those specific Alpaca models are all in safetensor now and there isn't simple converter to ggml.
5. sp332 ◴[] No.35397363[source]
Late edit: Deep Speed, not FlexGen. I don't know if FG could work, but that repo only supports it for Opt models.