I'm having trouble running this on my Mac - I've tried Ollama and llama.cpp llama-server so far, both using GGUFs from Hugging Face, but neither worked.
(llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'smollm3')
I've managed to run it using Python and transformers with PyTorch in device="cpu" mode but unsurprisingly that's really slow - it took 35s to respond to "say hi"!
Anyone had success with this on a Mac yet? I really want to get this running with tool calling, ideally via an OpenAI-compatible serving layer like llama-server.
replies(2):