(api-docs.deepseek.com)

776 points wertyk | 3 comments | 21 Aug 25 19:06 UTC | HN request time: 0s | source

Show context

danielhanchen ◴[21 Aug 25 22:21 UTC] No.44978800[source]▶

For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

replies(6): >>44979837 #>>44980406 #>>44981373 #>>44982860 #>>44984274 #>>44987809 #

tw1984 ◴[22 Aug 25 02:20 UTC] No.44980406[source]▶

>>44978800 #

for such dynamic 2bit, is there any benchmark results showing how many performance I would give up compared to the original model? thanks.

replies(2): >>44980677 #>>44984158 #

1. danielhanchen ◴[22 Aug 25 03:13 UTC] No.44980677[source]▶

>>44980406 #

Currently no, but I'm running them! Some people on the aider discord are running some benchmarks!

replies(1): >>44988658 #

2. cowpig ◴[22 Aug 25 19:15 UTC] No.44988658[source]▶

>>44980677 (TP) #

@danielhanchen do you publish the benchmarks you run anywhere?

replies(1): >>45000148 #

3. danielhanchen ◴[24 Aug 25 00:11 UTC] No.45000148[source]▶

>>44988658 #

We had benchmarks for Llama 4 and Gemma 3 at https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs - for others I normally refer to https://discord.com/channels/1131200896827654144/12822404236... which is the Aider Polygot Discord - they always benchmark our quants :)

↑

DeepSeek-v3.1