←back to thread

DeepSeek-v3.1

(api-docs.deepseek.com)
776 points wertyk | 3 comments | | HN request time: 0s | source
Show context
danielhanchen ◴[] No.44978800[source]
For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

replies(6): >>44979837 #>>44980406 #>>44981373 #>>44982860 #>>44984274 #>>44987809 #
tw1984 ◴[] No.44980406[source]
for such dynamic 2bit, is there any benchmark results showing how many performance I would give up compared to the original model? thanks.
replies(2): >>44980677 #>>44984158 #
1. danielhanchen ◴[] No.44980677[source]
Currently no, but I'm running them! Some people on the aider discord are running some benchmarks!
replies(1): >>44988658 #
2. cowpig ◴[] No.44988658[source]
@danielhanchen do you publish the benchmarks you run anywhere?
replies(1): >>45000148 #
3. danielhanchen ◴[] No.45000148[source]
We had benchmarks for Llama 4 and Gemma 3 at https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs - for others I normally refer to https://discord.com/channels/1131200896827654144/12822404236... which is the Aider Polygot Discord - they always benchmark our quants :)