DeepSeek-v3.1

(api-docs.deepseek.com)

776 points wertyk | 1 comments | 21 Aug 25 19:06 UTC | HN request time: 0s | source

Show context

danielhanchen ◴[21 Aug 25 22:21 UTC] No.44978800[source]▶

For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

replies(6): >>44979837 #>>44980406 #>>44981373 #>>44982860 #>>44984274 #>>44987809 #

tw1984 ◴[22 Aug 25 02:20 UTC] No.44980406[source]▶

>>44978800 #

for such dynamic 2bit, is there any benchmark results showing how many performance I would give up compared to the original model? thanks.

replies(2): >>44980677 #>>44984158 #

segmondy ◴[22 Aug 25 13:04 UTC] No.44984158[source]▶

>>44980406 #

if you are running a 2bit quant, you are not giving up performance but gaining 100% performance since the alternative is usually 0%. Smaller quants are for folks who won't be able to run anything at all, so you run the largest you can run relative to your hardware. I for instance often ran Q3_K_L, I don't think of how much performance I'm giving up, but rather how without Q3, I won't be able to run it at all. With that said, for R1, I did some tests against 2 public interfaces and my local Q3 crushed them. The problem with a lot of model providers is we can never be sure what they are serving up and could take shortcuts to maximize profit.

replies(2): >>44985517 #>>44988013 #

linuxftw ◴[22 Aug 25 15:05 UTC] No.44985517[source]▶

>>44984158 #

That's true only in a vacuum. For example, should I run gpt-oss-20b unquantized or gpt-oss-120b quantaized? Some models have a 70b/30b spread, and that's only across a single base model, where many different models exist at different quants could be compared for different tasks.

replies(2): >>44986618 #>>44988087 #

jkingsman ◴[22 Aug 25 16:38 UTC] No.44986618{3}[source]▶

>>44985517 #

Definitely. As a hobbyist, I have yet to put together a good heuristic for better-quant-lower-params vs. smaller-quant-high-params. I've mentally been drawing the line at around q4, but now with IQ quants and improvements in the space I'm not so sure anymore.

replies(1): >>44987553 #

1. linuxftw ◴[22 Aug 25 17:56 UTC] No.44987553{4}[source]▶

>>44986618 #

Yeah, I've kinda quickly thrown in the towel trying to figure out what's 'best' for smaller memory systems. As things are just moving so quickly, whatever time I invest into that is likely to be for nil.

↑