←back to thread

602 points emrah | 1 comments | | HN request time: 0.207s | source
Show context
holografix ◴[] No.43743631[source]
Could 16gb vram be enough for the 27b QAT version?
replies(5): >>43743634 #>>43743704 #>>43743825 #>>43744249 #>>43756253 #
parched99 ◴[] No.43744249[source]
I am only able to get the Gemma-3-27b-it-qat-Q4_0.gguf (15.6GB) to run with a 100 token context size on a 5070 ti (16GB) using llamacpp.

Prompt Tokens: 10

Time: 229.089 ms

Speed: 43.7 t/s

Generation Tokens: 41

Time: 959.412 ms

Speed: 42.7 t/s

replies(3): >>43745881 #>>43746002 #>>43747323 #
floridianfisher ◴[] No.43745881[source]
Try one of the smaller versions. 27b is too big for your gpu
replies(1): >>43746177 #
1. parched99 ◴[] No.43746177[source]
I'm aware. I was addressing the question being asked.