←back to thread

326 points threeturn | 1 comments | | HN request time: 0.202s | source

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

Show context
jwpapi ◴[] No.45773488[source]
On a side note I really thing latency is still important. Is there some benefit in choosing location for where you get your responses from? Like with Openrouter f.e.

Also I could think that a local model just for autocomplete could help reducing latency for completion suggestions.

replies(1): >>45776348 #
1. oofbey ◴[] No.45776348[source]
Latency matters for the autocomplete models. But IMHO those suck and generally just get in the way.

For the big agentic tasks or reasoned questions, the many seconds or even minutes of LLM time dwarf RTT even to another continent.

Side note: I recently had GPT5 in Cursor spend fully 45 minutes on one prompt chewing on why a bug was flaky, and it figured it out! Your laptop is not gonna do that anytime soon.