←back to thread

Ask HN: What's Your Useful Local LLM Stack?

91 points Olshansky | 1 comments | 15 Jul 25 15:17 UTC | HN request time: 0.242s | source

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.

My daily (remote) LLM stack:

  - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.

  - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.

  - ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.

Here’s what I’ve cobbled together for my local stack so far:

Tools

  - Ollama: for running models locally

  - Aider: Claude-code-style CLI interface

  - VSCode w/ continue.dev extension: local chat & autocomplete

Models

  - Chat: llama3.1:latest

  - Autocomplete: Qwen2.5 Coder 1.5B

  - Coding/Editing: deepseek-coder-v2:16b

Things I’m not worried about:

  - CPU/Memory (running on an M1 MacBook)

  - Cost (within reason)

  - Data privacy / being trained on (not trying to start a philosophical debate here)

I am worried about:

  - Actual usefulness (i.e. “vibes”)

  - Ease of use (tools that fit with my muscle memory)

  - Correctness (not benchmarks)

  - Latency & speed

Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.

---

Who I am

  - CTO of a small startup (5 amazing engineers)

  - 20 years of coding (since I was 13)

  - Ex-big tech

Show context

alkh ◴[15 Jul 25 16:56 UTC] No.44573276[source]▶

>>44572043 (OP) #

I personally found Qwen2.5 Coder 7B to be on pair with deepseek-coder-v2:16b(but consumes less RAM on inference and faster), so that's what I am using locally. I actually created a custom model called "oneliner" that uses Qwen2.5 Coder 7B as a base and this system prompt:

SYSTEM """ You are a professional coder. You goal is to reply to user's questions in a consise and clear way. Your reply must include only code orcommands , so that the user could easily copy and paste them.

Follow these guidelines for python: 1) NEVER recommend using "pip install" directly, always recommend "python3 -m pip install" 2) The following are pypi modules: ruff, pylint, black, autopep8, etc. 3) If the error is module not found, recommend installing the module using "python3 -m pip install" command. 4) If activate is not available create an environment using "python3 -m venv .venv". """

I specifically use it for asking quick questions in terminal that I can copy & paste straight away(for ex. about git). For heavy-lifting I am using ChatGPT Plus(my own) + Github Copilot(provided by my company) + Gemini(provided by my company as well).

Can someone explain how one can set up autocomplete via ollama? That's something I would be interested to try.

replies(1): >>44573556 #

CamperBob2 ◴[15 Jul 25 17:18 UTC] No.44573556[source]▶

NEVER recommend using "pip install" directly, always recommend "python3 -m pip install"

Just out of curiosity, what's the difference?

Seems like all the cool kids are using uv.

replies(5): >>44573684 #>>44573816 #>>44574108 #>>44578031 #>>44579428 #

1. th0ma5 ◴[15 Jul 25 17:29 UTC] No.44573684[source]▶

You mean to say that there is a lot of hype for uv because it is nice and quick but also gives an easy rhetorical win for junior people in any discussion about packaging in Python currently, so obviously that's going to be very popular even if it doesn't work for everyone.

The difference is to try to decouple the environment from the runtime essentially.