←back to thread

Ask HN: What's Your Useful Local LLM Stack?

91 points Olshansky | 1 comments | 15 Jul 25 15:17 UTC | HN request time: 0s | source

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.

My daily (remote) LLM stack:

  - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.

  - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.

  - ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.

Here’s what I’ve cobbled together for my local stack so far:

Tools

  - Ollama: for running models locally

  - Aider: Claude-code-style CLI interface

  - VSCode w/ continue.dev extension: local chat & autocomplete

Models

  - Chat: llama3.1:latest

  - Autocomplete: Qwen2.5 Coder 1.5B

  - Coding/Editing: deepseek-coder-v2:16b

Things I’m not worried about:

  - CPU/Memory (running on an M1 MacBook)

  - Cost (within reason)

  - Data privacy / being trained on (not trying to start a philosophical debate here)

I am worried about:

  - Actual usefulness (i.e. “vibes”)

  - Ease of use (tools that fit with my muscle memory)

  - Correctness (not benchmarks)

  - Latency & speed

Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.

---

Who I am

  - CTO of a small startup (5 amazing engineers)

  - 20 years of coding (since I was 13)

  - Ex-big tech

Show context

alkh ◴[15 Jul 25 16:56 UTC] No.44573276[source]▶

>>44572043 (OP) #

I personally found Qwen2.5 Coder 7B to be on pair with deepseek-coder-v2:16b(but consumes less RAM on inference and faster), so that's what I am using locally. I actually created a custom model called "oneliner" that uses Qwen2.5 Coder 7B as a base and this system prompt:

SYSTEM """ You are a professional coder. You goal is to reply to user's questions in a consise and clear way. Your reply must include only code orcommands , so that the user could easily copy and paste them.

Follow these guidelines for python: 1) NEVER recommend using "pip install" directly, always recommend "python3 -m pip install" 2) The following are pypi modules: ruff, pylint, black, autopep8, etc. 3) If the error is module not found, recommend installing the module using "python3 -m pip install" command. 4) If activate is not available create an environment using "python3 -m venv .venv". """

I specifically use it for asking quick questions in terminal that I can copy & paste straight away(for ex. about git). For heavy-lifting I am using ChatGPT Plus(my own) + Github Copilot(provided by my company) + Gemini(provided by my company as well).

Can someone explain how one can set up autocomplete via ollama? That's something I would be interested to try.

replies(1): >>44573556 #

CamperBob2 ◴[15 Jul 25 17:18 UTC] No.44573556[source]▶

NEVER recommend using "pip install" directly, always recommend "python3 -m pip install"

Just out of curiosity, what's the difference?

Seems like all the cool kids are using uv.

replies(5): >>44573684 #>>44573816 #>>44574108 #>>44578031 #>>44579428 #

jdthedisciple ◴[15 Jul 25 18:06 UTC] No.44574108[source]▶

uv? guess I'm old school.

pip install it is for me

replies(1): >>44580236 #

1. kh_hk ◴[16 Jul 25 09:11 UTC] No.44580236[source]▶

There's nothing old school / cool kids about uv and pip. uv is a pip/venv/... interface. If you know how to use pip and venv, you know how to use uv. I use it as a useful toolchain to circumvent missing project/setup.py/requirements shenanigans