←back to thread

326 points threeturn | 3 comments | | HN request time: 0.001s | source

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

Show context
simonw ◴[] No.45773803[source]
I'd be very interested to hear from anyone who's finding local models that work well for coding agents (Claude Code, Codex CLI, OpenHands etc).

I haven't found a local model that fits on a 64GB Mac or 128GB Spark yet that appears to be good enough to reliably run bash-in-a-loop over multiple turns, but maybe I haven't tried the right combination of models and tools.

replies(1): >>45773874 #
embedding-shape ◴[] No.45773874[source]
I've had good luck with GPT-OSS-120b (reasoning_effort set to "high") + Codex + llama.cpp all running locally, but I needed to do some local patches to Codex as they don't allow configuring and setting the right values for temperature and top_p for GPT-OSS. Also heavy prompting via AGENTS.md was needed to get it to have similar workflow to GPT-5, it didn't seem to pick up that by itself, so I'm assuming GPT-5 been trained with Codex in mind while GPT-OSS wasn't.
replies(1): >>45775120 #
1. Xenograph ◴[] No.45775120[source]
Would love for you to share the Codex patches you needed to make and the AGENTS.md prompting, if you're open to it.
replies(1): >>45775676 #
2. embedding-shape ◴[] No.45775676[source]
Basically just find the place where the inference call happens, add top_k, top_p and temperature to hard-coded numbers (0, 1.0 and 1.0 for GPT-OSS) and you should be good to go. If you really need it, I could dig out patch from it, but it should be really straightforward today, and my patch might be conflicting with the current master of codex, I've diverged for other reasons since I did this.
replies(1): >>45777317 #
3. Xenograph ◴[] No.45777317[source]
That makes sense, wasn't sure if it was as simple as tweaking those two numbers or not, thanks for sharing!

If there's any insight you can share about your AGENTS.md prompting, it may also be helpful for others!