←back to thread

91 points Olshansky | 1 comments | | HN request time: 0s | source

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.

My daily (remote) LLM stack:

  - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.

  - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.

  - ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.
Here’s what I’ve cobbled together for my local stack so far:

Tools

  - Ollama: for running models locally

  - Aider: Claude-code-style CLI interface

  - VSCode w/ continue.dev extension: local chat & autocomplete
Models

  - Chat: llama3.1:latest

  - Autocomplete: Qwen2.5 Coder 1.5B

  - Coding/Editing: deepseek-coder-v2:16b
Things I’m not worried about:

  - CPU/Memory (running on an M1 MacBook)

  - Cost (within reason)

  - Data privacy / being trained on (not trying to start a philosophical debate here)
I am worried about:

  - Actual usefulness (i.e. “vibes”)

  - Ease of use (tools that fit with my muscle memory)

  - Correctness (not benchmarks)

  - Latency & speed
Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.

---

Who I am

  - CTO of a small startup (5 amazing engineers)

  - 20 years of coding (since I was 13)

  - Ex-big tech
Show context
ttkciar ◴[] No.44576526[source]
Senior software engineer with 46 years of experience (since I was 7). LLM inference hasn't been too useful for me for writing code, but it has proven very useful for explaining my coworkers' code to me.

Recently I had Gemma3-27B-it explain every Python script and library in a repo with the command:

$ find -name '*.py' -print -exec /home/ttk/bin/g3 "Explain this code in detail:\n\n`cat {}`" \; | tee explain.txt

There were a few files it couldn't figure out without other files, so I ran a second pass with those, giving it the source files it needed to understand source files that used them. Overall, pretty easy, and highly clarifying.

My shell script for wrapping llama.cpp's llama-cli and Gemma3: http://ciar.org/h/g3

That script references this grammar file which forces llama.cpp to infer only ASCII: http://ciar.org/h/ascii.gbnf

Cost: electricity

I've been meaning to check out Aider and GLM-4, but even if it's all it's cracked up to be, I expect to use it sparingly. Skills which aren't exercised are lost, and I'd like to keep my programming skills sharp.

replies(1): >>44695117 #
1. andrei_says_ ◴[] No.44695117[source]
Thank you for this. Used an LLM to summarize some key files of an old project of mine and the summary was excellent. Saved me a ton of time for analyzing my own code :)