←back to thread

DeepSeek-v3.1

(api-docs.deepseek.com)
776 points wertyk | 2 comments | | HN request time: 0.002s | source
Show context
danielhanchen ◴[] No.44978800[source]
For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

replies(6): >>44979837 #>>44980406 #>>44981373 #>>44982860 #>>44984274 #>>44987809 #
pshirshov ◴[] No.44979837[source]
By the way, I'm wondering why unsloth (a goddamn python library) tries to run apt-get with sudo (and fails on my nixos). Like how tf we are supposed to use that?
replies(2): >>44980068 #>>44981691 #
danielhanchen ◴[] No.44980068[source]
Oh hey I'm assuming this is for conversion to GGUF after a finetune? If you need to quantize to GGUF Q4_K_M, we have to compile llama.cpp, hence apt-get and compiling llama.cpp within a Python shell.

There is a way to convert to Q8_0, BF16, F16 without compiling llama.cpp, and it's enabled if you use `FastModel` and not on `FastLanguageModel`

Essentially I try to do `sudo apt-get` if it fails then `apt-get` and if all fails, it just fails. We need `build-essential cmake curl libcurl4-openssl-dev`

See https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_z...

replies(5): >>44980567 #>>44980608 #>>44980665 #>>44982700 #>>44983011 #
pshirshov ◴[] No.44982700{3}[source]
It won't work well if you deal with non ubuntu+cuda combination. Better just fail with a reasonable message.
replies(1): >>44982740 #
danielhanchen ◴[] No.44982740{4}[source]
For now I'm re-directly people to our docs https://docs.unsloth.ai/basics/troubleshooting-and-faqs#how-...

But I'm working on more cross platform docs as well!

replies(1): >>44982883 #
pshirshov ◴[] No.44982883{5}[source]
My current solution is to pack llama.cpp as a custom nix formula (the one in nixpkgs has the conversion script broken) and run it myself. I wasn't able to run unsloth on ROCM nor for inference nor for conversion, sticking with peft for now but I'll attempt again to re-package it.
replies(1): >>44983054 #
danielhanchen ◴[] No.44983054{6}[source]
Oh interesting oh for ROCM there are some installation instructions here: https://rocm.docs.amd.com/projects/ai-developer-hub/en/lates...

I'm working with the AMD folks to make the process easier, but it looks like first I have to move off from pyproject.toml to setup.py (allows building binaries)

replies(1): >>44983263 #
1. pshirshov ◴[] No.44983263{7}[source]
Yes, it's trivial with the pre-built vllm docker, but I need a declarative way to configure my environment. The lack of prebuilt rocm wheels for vllm is the main hindrance for now but I was shocked to see the sudo apt-get in your code. Ideally, llama.cpp should publish their gguf python library and the conversion script to pypi with every release, so you can just add that stuff as a dependency. vllm should start publishing a rocm wheel, after that unsloth would need to start publishing two versions - a cuda one and a rocm one.
replies(1): >>44984071 #
2. danielhanchen ◴[] No.44984071[source]
Yes apologies again - yes rocm is still an issue