←back to thread

326 points threeturn | 4 comments | | HN request time: 0.877s | source

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

1. embedding-shape ◴[] No.45773654[source]
> Which model(s) are you running (e.g., Ollama, LM Studio, or others)

I'm running mainly GPT-OSS-120b/20b depending on the task, Magistral for multimodal stuff and some smaller models I've fine-tuned myself for specific tasks..

All the software is implemented by myself, but I started out with basically calling out to llama.cpp, as it was the simplest and fastest option that let me integrate it into my own software without requiring a GUI.

I use Codex and Claude Code from time to time to do some mindless work too, Codex hooked up to my local GPT-OSS-120b while Claude Code uses Sonnet.

> What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

Desktop, Ryzen 9 5950X, 128GB of RAM, RTX Pro 6000 Blackwell (96GB VRAM), performs very well and I can run most of the models I use daily all together, unless I want really large context then just GPT-OSS-120B + max context, ends up taking ~70GB of VRAM.

> What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

Almost anything and everything, but mostly coding. But then general questions, researching topics, troubleshooting issues with my local infrastructure, troubleshooting things happening in my other hobbies and a bunch of other stuff. As long as you give the local LLM access to a search tool (I use YaCy + my own adapter), local models works better for me than the hosted models, mainly because of the speed and I have better control over the inference.

It does fall short on really complicated stuff. Right now I'm trying to do CUDA programming, creating a fused MoE kernel for inference in Rust, and it's a bit tricky as there are a lot of moving parts and I don't understand the subject 100%, and when you get to that point, it's a bit hit or miss. You really need to have a proper understanding of what you use the LLM for, otherwise it breaks down quickly. Divide and conquer as always helps a lot.

replies(1): >>45775396 #
2. andai ◴[] No.45775396[source]
gpt-oss-120b keeps stopping for me in Codex. (Also in Crush.)

I have to say "continue" constantly.

replies(1): >>45775692 #
3. embedding-shape ◴[] No.45775692[source]
See https://news.ycombinator.com/item?id=45773874 (TLDR, you need to hard-code some inference parameters to be the right ones, otherwise you'd get really bad behaviour + prompting to get the workflow right)
replies(1): >>45776746 #
4. andai ◴[] No.45776746{3}[source]
Thanks. Did you need to modify Codex's prompt?