Also I could think that a local model just for autocomplete could help reducing latency for completion suggestions.
Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?
What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?
What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).
I'm conducting my own investigation, which I will be happy to share as well when over.
Thanks! Andrea.
Also I could think that a local model just for autocomplete could help reducing latency for completion suggestions.
For the big agentic tasks or reasoned questions, the many seconds or even minutes of LLM time dwarf RTT even to another continent.
Side note: I recently had GPT5 in Cursor spend fully 45 minutes on one prompt chewing on why a bug was flaky, and it figured it out! Your laptop is not gonna do that anytime soon.