←back to thread

91 points Olshansky | 3 comments | | HN request time: 0s | source

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.

My daily (remote) LLM stack:

  - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.

  - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.

  - ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.
Here’s what I’ve cobbled together for my local stack so far:

Tools

  - Ollama: for running models locally

  - Aider: Claude-code-style CLI interface

  - VSCode w/ continue.dev extension: local chat & autocomplete
Models

  - Chat: llama3.1:latest

  - Autocomplete: Qwen2.5 Coder 1.5B

  - Coding/Editing: deepseek-coder-v2:16b
Things I’m not worried about:

  - CPU/Memory (running on an M1 MacBook)

  - Cost (within reason)

  - Data privacy / being trained on (not trying to start a philosophical debate here)
I am worried about:

  - Actual usefulness (i.e. “vibes”)

  - Ease of use (tools that fit with my muscle memory)

  - Correctness (not benchmarks)

  - Latency & speed
Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.

---

Who I am

  - CTO of a small startup (5 amazing engineers)

  - 20 years of coding (since I was 13)

  - Ex-big tech
Show context
clvx ◴[] No.44573449[source]
In a related subject, what’s the best hardware to run local LLM’s for this use case? Assuming a budget of no more of $2.5K.

And, is there an open source implementation of an agentic workflow (search tools and others) to use it with local LLM’s?

replies(5): >>44573471 #>>44573605 #>>44573699 #>>44577055 #>>44578000 #
1. dent9 ◴[] No.44578000[source]
You can get used RTX 3090 for $750-800 each. Pro tip; look for 2.5 slot sized models line EVGA XC3 or the older blower models. Then you can get two for $1600, fit them in a full size case, 128GB DDR5 for $300, some Ryzen CPU like the 9900X and a mobo and case and PSU to fill up the rest of the budget. If you want to skimp you can drop one of the GPUs until you're sure you need 48GB VRAM and some of the RAM but you really don't save that much. Just make sure you get a case that can fit multiple full size GPU and a mobo that can support it as well. The slot configurations are pretty bad on the AM5 generation for multi GPU. You'll probably end up with a mobo such as Asus ProArt

Also none of this is worth the money because it's simply not possible to run the same kinds of models you pay for online on a standard home system. Things like ChatGPT 4o use more VRAM than you'll ever be able to scrounge up unless your budget is closer to $10,000-25,000+. Think multiple RTX A6000 cards or similar. So ultimately you're better off just paying for the online hosted services

replies(1): >>44580035 #
2. beefnugs ◴[] No.44580035[source]
I think this proves one of the suckpoints of AI : there are clearly certain things that the smaller models should be fine at... but there doesn't seem to be frameworks or something that constantly analyze and simulate and evaluate what you could be doing with smaller and cheaper models

Of course the economics are completely at odds with any real engineering: nobody wants you to use smaller local models, nobody wants you to consider cost/efficiency saving

replies(1): >>44637393 #
3. satvikpendem ◴[] No.44637393[source]
> but there doesn't seem to be frameworks or something that constantly analyze and simulate and evaluate what you could be doing with smaller and cheaper models

This is more of a social problem. Read through r/LocalLlama every so often and you'll see how people are optimizing their usage.