And, is there an open source implementation of an agentic workflow (search tools and others) to use it with local LLM’s?
What does your actually useful local LLM stack look like?
I’m looking for something that provides you with real value — not just a sexy demo.
---
After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.
My daily (remote) LLM stack:
- Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.
- Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.
- ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.
Here’s what I’ve cobbled together for my local stack so far:Tools
- Ollama: for running models locally
- Aider: Claude-code-style CLI interface
- VSCode w/ continue.dev extension: local chat & autocomplete
Models - Chat: llama3.1:latest
- Autocomplete: Qwen2.5 Coder 1.5B
- Coding/Editing: deepseek-coder-v2:16b
Things I’m not worried about: - CPU/Memory (running on an M1 MacBook)
- Cost (within reason)
- Data privacy / being trained on (not trying to start a philosophical debate here)
I am worried about: - Actual usefulness (i.e. “vibes”)
- Ease of use (tools that fit with my muscle memory)
- Correctness (not benchmarks)
- Latency & speed
Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.---
Who I am
- CTO of a small startup (5 amazing engineers)
- 20 years of coding (since I was 13)
- Ex-big tech
And, is there an open source implementation of an agentic workflow (search tools and others) to use it with local LLM’s?
Seems like there would be cost advantages and always-online advantages. And the risk of a desktop computer getting damaged/stolen is much lower than for laptops.
Also none of this is worth the money because it's simply not possible to run the same kinds of models you pay for online on a standard home system. Things like ChatGPT 4o use more VRAM than you'll ever be able to scrounge up unless your budget is closer to $10,000-25,000+. Think multiple RTX A6000 cards or similar. So ultimately you're better off just paying for the online hosted services
Of course the economics are completely at odds with any real engineering: nobody wants you to use smaller local models, nobody wants you to consider cost/efficiency saving
This is more of a social problem. Read through r/LocalLlama every so often and you'll see how people are optimizing their usage.