My current setup is the llama-vscode plugin + llama-server running Qwen/Qwen2.5-Coder-7B-Instruct. It leads to very fast completions, and don't have to worry about internet outages which take me out of the zone.
I do wish qwen-3 released a 7B model supporting FIM tokens. 7B seems to be the sweet spot for fast and usable completions