DeepSeek-v3.1

(api-docs.deepseek.com)

776 points wertyk | 1 comments | 21 Aug 25 19:06 UTC | HN request time: 0s | source

Show context

danielhanchen ◴[21 Aug 25 22:21 UTC] No.44978800[source]▶

For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

replies(6): >>44979837 #>>44980406 #>>44981373 #>>44982860 #>>44984274 #>>44987809 #

pshirshov ◴[22 Aug 25 00:31 UTC] No.44979837[source]▶

>>44978800 #

By the way, I'm wondering why unsloth (a goddamn python library) tries to run apt-get with sudo (and fails on my nixos). Like how tf we are supposed to use that?

replies(2): >>44980068 #>>44981691 #

danielhanchen ◴[22 Aug 25 01:09 UTC] No.44980068[source]▶

>>44979837 #

Oh hey I'm assuming this is for conversion to GGUF after a finetune? If you need to quantize to GGUF Q4_K_M, we have to compile llama.cpp, hence apt-get and compiling llama.cpp within a Python shell.

There is a way to convert to Q8_0, BF16, F16 without compiling llama.cpp, and it's enabled if you use `FastModel` and not on `FastLanguageModel`

Essentially I try to do `sudo apt-get` if it fails then `apt-get` and if all fails, it just fails. We need `build-essential cmake curl libcurl4-openssl-dev`

See https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_z...

replies(5): >>44980567 #>>44980608 #>>44980665 #>>44982700 #>>44983011 #

lambda[dead post] ◴[22 Aug 25 02:59 UTC] No.44980608{3}[source]▶

>>44980068 #

[flagged]

danielhanchen ◴[22 Aug 25 03:10 UTC] No.44980664{4}[source]▶

>>44980608 #

I added it since many people who used Unsloth don't know how to compile llama.cpp, so the only way from Python's side is to either (1) Install it via apt-get within the Python shell (2) Error out then tell the user to install it first, then continue again

I chose (1) since it was mainly for ease of use for the user - but I agree it's not a good idea sorry!

:( I also added a section to manually compile llama.cpp here: https://docs.unsloth.ai/basics/troubleshooting-and-faqs#how-...

But I agree I should remove apt-gets - will do this asap! Thanks for the suggestions :)

replies(2): >>44981012 #>>44981704 #

exe34 ◴[22 Aug 25 06:57 UTC] No.44981704{5}[source]▶

>>44980664 #

have you considered cosmopolitan? e.g. like llamafile that works on everything up to and including toasters.

replies(1): >>44981740 #

1. danielhanchen ◴[22 Aug 25 07:07 UTC] No.44981740{6}[source]▶

>>44981704 #

Oh llamafile is very cool! I might add it as an option actually :) For generic exports (ie to vLLM, llamafile etc), normally finetunes end with model.save_pretrained_merged and that auto merges to 16bit safetensors which allows for further processing downstream - but I'll investigate llamafile more! (good timing since llamafile is cross platform!)

↑