(api-docs.deepseek.com)

776 points wertyk | 4 comments | 21 Aug 25 19:06 UTC | HN request time: 0.001s | source

Show context

danielhanchen ◴[21 Aug 25 22:21 UTC] No.44978800[source]▶

For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

replies(6): >>44979837 #>>44980406 #>>44981373 #>>44982860 #>>44984274 #>>44987809 #

1. azinman2 ◴[22 Aug 25 18:16 UTC] No.44987809[source]▶

>>44978800 #

It’d also be great if you guys could do a fine tune to run on an 8x80G A/H100. These H200/B200 configs are harder to come by (and much more expensive).

replies(1): >>44987952 #

2. danielhanchen ◴[22 Aug 25 18:26 UTC] No.44987952[source]▶

>>44987809 (TP) #

Unsloth should work on any GPU setup all the way until the old Tesla T4s and the newer B200s :) We're working on a faster and better multi GPU version, but using accelerate / torchrun manually + Unsloth should work out of the box!

replies(1): >>44987969 #

3. azinman2 ◴[22 Aug 25 18:27 UTC] No.44987969[source]▶

>>44987952 #

I guess I was hoping for you guys to put up these weights. I think they’d be popular for these very large models.

You guys already do a lot for the local LLM community and I appreciate it.

replies(1): >>45000138 #

4. danielhanchen ◴[24 Aug 25 00:07 UTC] No.45000138{3}[source]▶

>>44987969 #

I'll see what I can do :)

↑

DeepSeek-v3.1