I have a nvidia jetson orin nano with llama.ccp/ollama. Gemma3:4b / Gemma3-4b-it is awesome, reasonable fast (even with vision - i think its like 15t/s) and all that on a raspberry sized microcontroller.
Simons llm client tool is on every machine and I use it daily