←back to thread

326 points threeturn | 1 comments | | HN request time: 0.206s | source

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

Show context
dust42 ◴[] No.45773793[source]
On a Macbook pro 64GB I use Qwen3-Coder-30B-A3B Q4 quant with llama.cpp.

For VSCode I use continue.dev as it allows to set my own (short) system prompt. I get around 50token/sec generation and prompt processing 550t/s.

When giving well defined small tasks, it is as good as any frontier model.

I like the speed and low latency and the availability while on the plane/train or off-grid.

Also decent FIM with the llama.cpp VSCode plugin.

If I need more intelligence my personal favourites are Claude and Deepseek via API.

replies(3): >>45774391 #>>45775753 #>>45777164 #
1. codingbear ◴[] No.45775753[source]
how are you running qwen3 with llama-vscode? I am still using qwen-2.5-7b.

There is an open issue about adding support for Qwn3 which I have been monitoring, would love to use Qwen3 if possible. Issue - https://github.com/ggml-org/llama.vscode/issues/55