←back to thread

602 points emrah | 3 comments | | HN request time: 0.595s | source
Show context
holografix ◴[] No.43743631[source]
Could 16gb vram be enough for the 27b QAT version?
replies(5): >>43743634 #>>43743704 #>>43743825 #>>43744249 #>>43756253 #
1. hskalin ◴[] No.43743825[source]
With ollama you could offload a few layers to cpu if they don't fit in the VRAM. This will cost some performance ofcourse but it's much better than the alternative (everything on cpu)
replies(2): >>43744666 #>>43752342 #
2. senko ◴[] No.43744666[source]
I'm doing that with a 12GB card, ollama supports it out of the box.

For some reason, it only uses around 7GB of VRAM, probably due to how the layers are scheduled, maybe I could tweak something there, but didn't bother just for testing.

Obviously, perf depends on CPU, GPU and RAM, but on my machine (3060 + i5-13500) it's around 2 t/s.

3. dockerd ◴[] No.43752342[source]
Does it work on LM Studio? Loading 27b-it-qat taking up more than 22GB on 24GB mac.