(huggingface.co)

347 points kashifr | 1 comments | 08 Jul 25 16:13 UTC | HN request time: 0.198s | source

Show context

danielhanchen ◴[08 Jul 25 22:46 UTC] No.44504715[source]▶

>>44501413 (OP) #

I fixed some chat template issues for llama.cpp and other inference engines! To run it, do:

./llama.cpp/llama-cli -hf unsloth/SmolLM3-3B-GGUF:Q4_K_XL --jinja -ngl 99

replies(2): >>44505656 #>>44507813 #

diggan ◴[09 Jul 25 09:10 UTC] No.44507813[source]▶

>>44504715 #

> fixed some chat template issues

This seems to be a persistent issue with almost all weight releases, even from bigger companies like Meta.

Are the people who release these weights not testing them in various inference engines? Seems they make it work with Huggingface's Transformers library, then call it a day, but sometimes not even that.

replies(1): >>44508570 #

1. clarionbell ◴[09 Jul 25 11:12 UTC] No.44508570[source]▶

>>44507813 #

No they don't. Why would they? Most of them are using a single inference engine, most likely developed inhouse. Or they go for something like vLLM, but llama.cpp especially is under their radar.

The reason is simple. There isn't much money in it. llama.cpp is free and targets lower end of the hardware spectrum. Corporations will run something else, or even more likely, offload the task to contractor.

↑

Smollm3: Smol, multilingual, long-context reasoner LLM