./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"
More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1
./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"
More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1
There is a way to convert to Q8_0, BF16, F16 without compiling llama.cpp, and it's enabled if you use `FastModel` and not on `FastLanguageModel`
Essentially I try to do `sudo apt-get` if it fails then `apt-get` and if all fails, it just fails. We need `build-essential cmake curl libcurl4-openssl-dev`
See https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_z...
You just fail and print a nice error message telling the user exactly what they need to do, including the exact apt command or whatever that they need to run.
I was thinking if I can do it during the pip install or via setup.py which will do the apt-get instead.
As a fallback, I'll probably for now remove shell executions and just warn the user
The current solution hopefully is in between - ie sudo is gone, apt-get will run only after the user agrees by pressing enter, and if it fails, it'll tell the user to read docs on installing llama.cpp
Usually you don't make assumptions on the host OS, just try to find the things you need and if not, fail, ideally with good feedback. If you want to provide the "hack", you can still do it, but ideally behind a flag, `allow_installation` or something like that. This is, if you want your code to reach broader audiences.
Some people may prefer using whatever llama.cpp in $PATH, it's okay to support that, though I'd say doing so may lead to more confused noob users spam - they may just have an outdated version lurking in $PATH.
Doing so makes unsloth wheel platform-dependent, if this is too much of a burden, then maybe you can just package llama.cpp binary and have it on PyPI, like how scipy guys maintain a https://pypi.org/project/cmake/ on PyPI (yes, you can `pip install cmake`), and then depends on it (maybe in an optional group, I see you already have a lot due to cuda shit).
I'm still working on it, but sadly I'm not a packaging person so progress has been nearly zero :(
From how I interpreted it, he meant you could create a new python package, this would effectively be the binary you need.
In your current package, you could depend on the new one, and through that - pull in the binary.
This would let you easily decouple your package from the binary,too - so it'd be easy to update the binary to latest even without pushing a new version of your original package
I've maintained release pipelines before and handled packaging in a previous job, but I'm not particularly into the python ecosystem, so take this with a grain of salt: an approach would be
Pip Packages :
* Unsloth: current package, prefers using unsloth-llama, and uses path llama-cpp as fallback (with error msg as final fallback if neither exist, promoting install for unsloth-llama)
* Unsloth-llama: new package which only bundles the llama cpp binary
I was trying to see if I could pre-compile some llama.cpp binaries then save them as a zip file (I'm a noob sorry) - but I definitely need to investigate further on how to do python pip binaries
Try to find prebuilt and download.
See if you can compile from source if a compiler is installed.
If no compiler: prompt to install via sudo apt and explaining why, also give option to abort and have the user install a compiler themselves.
This isn't perfect, but limits the cases where prompting is necessary.
But I think similarly for uv we need a setup.py for packaging binaries (more complex)