https://www.youtube.com/watch?v=coIj2CU5LMU
Would this version (ggerganov) work with one of those methods?
https://www.youtube.com/watch?v=coIj2CU5LMU
Would this version (ggerganov) work with one of those methods?
I did the following:
1. Create a new working directory.
2. git clone https://github.com/ggerganov/llama.cpp
3. Download the latest release from https://github.com/ggerganov/llama.cpp/releases (note the CPU requirements in the filename) and unzip directly into the working directory's llama.cpp/ - you'll have the .exe files and .py scripts in the same directory.
4. Open PowerShell, cd to the working directory/llama.cpp, and create a new Python virtual environment: python3 -m venv env and activate the environment: .\env\Scripts\Activate.ps1
5. Obtain the LLaMA model(s) via the magnet torrent link and place them in the models directory. I used 30B and it is slow, but usable, on my system. Not even ChatGPT 3 level especially for programming questions, but impressive.
6. python3 -m pip install torch numpy sentencepiece
7. python3 convert-pth-to-ggml.py models/30B/ 1 (you may delete the original .pth model files after this step to save disk space)
8. .\quantize.exe ./models/30B/ggml-model-f16.bin ./models/30B/ggml-model-q4_0.bin 2
9. I copied the examples/chat-13B.bat to a new chat-30B.bat file, updated the model directory, and changed the last line of the script to: .\main.exe
10. Run using: .\examples\chat-30B.bat
https://github.com/ggerganov/llama.cpp#usage has details, although it assumes 7B and skips a few of the above steps.