(api-docs.deepseek.com)

776 points wertyk | 1 comments | 21 Aug 25 19:06 UTC | HN request time: 0.21s | source

Show context

danielhanchen ◴[21 Aug 25 22:21 UTC] No.44978800[source]▶

For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

replies(6): >>44979837 #>>44980406 #>>44981373 #>>44982860 #>>44984274 #>>44987809 #

efilife ◴[22 Aug 25 10:34 UTC] No.44982860[source]▶

>>44978800 #

>250GB, how do you guys run this stuff?

replies(1): >>44983044 #

1. danielhanchen ◴[22 Aug 25 11:06 UTC] No.44983044[source]▶

>>44982860 #

I'm working on sub 165GB ones!

165GB will need a 24GB GPU + 141GB of RAM for reasonably fast inference or a Mac

↑

DeepSeek-v3.1