←back to thread

347 points kashifr | 1 comments | | HN request time: 0.198s | source
Show context
danielhanchen ◴[] No.44504715[source]
I fixed some chat template issues for llama.cpp and other inference engines! To run it, do:

./llama.cpp/llama-cli -hf unsloth/SmolLM3-3B-GGUF:Q4_K_XL --jinja -ngl 99

replies(2): >>44505656 #>>44507813 #
diggan ◴[] No.44507813[source]
> fixed some chat template issues

This seems to be a persistent issue with almost all weight releases, even from bigger companies like Meta.

Are the people who release these weights not testing them in various inference engines? Seems they make it work with Huggingface's Transformers library, then call it a day, but sometimes not even that.

replies(1): >>44508570 #
1. clarionbell ◴[] No.44508570[source]
No they don't. Why would they? Most of them are using a single inference engine, most likely developed inhouse. Or they go for something like vLLM, but llama.cpp especially is under their radar.

The reason is simple. There isn't much money in it. llama.cpp is free and targets lower end of the hardware spectrum. Corporations will run something else, or even more likely, offload the task to contractor.