Could 16gb vram be enough for the 27b QAT version?
replies(5):
Prompt Tokens: 10
Time: 229.089 ms
Speed: 43.7 t/s
Generation Tokens: 41
Time: 959.412 ms
Speed: 42.7 t/s
Best to have two or more low-end, 16GB GPUs for a total of 32GB VRAM to run most of the better local models.
If you want a bit more context, try -ctv q8 -ctk q8 (from memory so look it up) to quant the kv cache.
Also an imatrix gguf like iq4xs might be smaller with better quality