←back to thread

347 points kashifr | 1 comments | | HN request time: 0.306s | source
Show context
simonw ◴[] No.44505302[source]
I'm having trouble running this on my Mac - I've tried Ollama and llama.cpp llama-server so far, both using GGUFs from Hugging Face, but neither worked.

(llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'smollm3')

I've managed to run it using Python and transformers with PyTorch in device="cpu" mode but unsurprisingly that's really slow - it took 35s to respond to "say hi"!

Anyone had success with this on a Mac yet? I really want to get this running with tool calling, ideally via an OpenAI-compatible serving layer like llama-server.

replies(2): >>44505665 #>>44507822 #
1. tripplyons ◴[] No.44505665[source]
Have you tried setting device="mps" to use Metal? It should be faster than PyTorch's "cpu" device on Mac.