Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop

330 points threeturn | 2 comments | 31 Oct 25 13:39 UTC | HN request time: 0.399s | source

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

Show context

giancarlostoro ◴[31 Oct 25 18:02 UTC] No.45774851[source]▶

>>45771870 (OP) #

If you're going to get a MacBook, get the Pro, it has a built-in fan, you don't want the heat just sitting there on the MacBook Air. Same with the Mac mini, get the studio instead, it has a fan, the Mini does not. I don't know about you but I wouldn't want my brand new laptop / desktop to be heating up the entire time I'm coding with 0 cool off. If you go the Mac route, I recommend getting TG Pro, the default fan settings on the Mac are awful they don't kick in soon enough, TG Pro lets you make it a little more "sensitive" to those temperature shifts, its like $20 for TG Pro if I remember correctly, but worth it.

I have a MacBook Pro with an M4 Pro chip, and 24GB of RAM, I believe only 16 of it is usable by the models, so I can run the GPT OSS 20B model (iirc) but the smaller one. It can do a bit, but the context window fills up quickly, so I do find myself switching context windows often enough. I do wonder if a maxed out MacBook Pro would be able to run larger context windows, then I would easily be able to code all day with it offline.

I do think Macs are phenomenal at running local LLMs if you get the right one.

replies(5): >>45774895 #>>45774912 #>>45775019 #>>45776988 #>>45777550 #

1. embedding-shape ◴[31 Oct 25 18:06 UTC] No.45774895[source]▶

>>45774851 #

> I do think Macs are phenomenal at running local LLMs if you get the right one.

How does the prompt processing speed look like today? I think it was either M3 or M4 together with 128GB, trying to run even slightly longer prompts took forever for the initial prompt processing so whatever speed gain you get at inference, basically didn't matter. Maybe it works better today?

replies(1): >>45778334 #

2. giancarlostoro ◴[01 Nov 25 00:54 UTC] No.45778334[source]▶

>>45774895 (TP) #

I have only ever used the M4 (on my wife's Macbook Air) and M4 Pro (on my Macbook Pro) and it was reasonable speeds, I was able to tie LM Studio with PyCharm and ask it questions about code, but my context Window kept running out, I don't think the 24GB model is the right choice, the key thing you have to also look out for is for example I might hvae 24GB of RAM, but only 16 of it can be used as VRAM, so I'm more competitive than my 3080 in terms of VRAM, though my 3080 could probably run circles around my M4 Pro if it wanted to.

↑