Popular/hot comments

>>44573407 #

←back to thread

Ask HN: What's Your Useful Local LLM Stack?

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.

My daily (remote) LLM stack:

  - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.

  - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.

  - ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.

Here’s what I’ve cobbled together for my local stack so far:

Tools

  - Ollama: for running models locally

  - Aider: Claude-code-style CLI interface

  - VSCode w/ continue.dev extension: local chat & autocomplete

Models

  - Chat: llama3.1:latest

  - Autocomplete: Qwen2.5 Coder 1.5B

  - Coding/Editing: deepseek-coder-v2:16b

Things I’m not worried about:

  - CPU/Memory (running on an M1 MacBook)

  - Cost (within reason)

  - Data privacy / being trained on (not trying to start a philosophical debate here)

I am worried about:

  - Actual usefulness (i.e. “vibes”)

  - Ease of use (tools that fit with my muscle memory)

  - Correctness (not benchmarks)

  - Latency & speed

Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.

---

Who I am

  - CTO of a small startup (5 amazing engineers)

  - 20 years of coding (since I was 13)

  - Ex-big tech

Show context

ashwinsundar ◴[15 Jul 25 16:48 UTC] No.44573186[source]▶

>>44572043 (OP) #

I just go outside when my internet is down for 15 minutes a year. Or tether to my cell phone plan if the need is urgent.

I don't see the point of a local AI stack, outside of privacy or some ethical concerns (which a local stack doesn't solve anyway imo). I also *only* have 24GB of RAM on my laptop, which it sounds like isn't enough to run any of the best models. Am I missing something by not upgrading and running a high-performance LLM on my machine?

replies(1): >>44573265 #

1. filchermcurr ◴[15 Jul 25 16:55 UTC] No.44573265[source]▶

>>44573186 #

I would say cost is a factor. Maybe not for OP, but many people aren't able to spend $135 a month on AI services.

replies(1): >>44573407 #

2. ashwinsundar ◴[15 Jul 25 17:05 UTC] No.44573407[source]▶

>>44573265 (TP) #

Does the cost of a new computer not get factored in? I think I would need to spend $2000+ to run a decent model locally, and even then I can only run open source models

Not to mention, running a giant model locally for hours a day is sure to shorten the lifespan of the machine…

replies(3): >>44573609 #>>44573634 #>>44574438 #

3. dpoloncsak ◴[15 Jul 25 17:22 UTC] No.44573609[source]▶

>>44573407 #

$2000 for a new machine is only a little over a year in AI costs for OP

replies(1): >>44580499 #

4. haiku2077 ◴[15 Jul 25 17:25 UTC] No.44573634[source]▶

>>44573407 #

The computer is a general purpose tool, though. You can play games, edit video and images, and self-host a movie/TV collection with real time transcoding with the same hardware. Many people have powerful PCs for playing games and running professional creative software already.

There's no reason running a model would shorten a machine's lifespan. PSUs, CPUs, motherboards, GPUs and RAM will all be long obsolete before they wear out even under full load. At worst you might have to swap thermal paste/pads a couple of years sooner. (A tube of paste is like, ten bucks.)

5. outworlder ◴[15 Jul 25 18:38 UTC] No.44574438[source]▶

>>44573407 #

> Not to mention, running a giant model locally for hours a day is sure to shorten the lifespan of the machine…

That is not a thing. Unless there's something wrong (badly managed thermals, an undersized PSU at the limit of its capacity, dusty unfiltered air clogging fans, aggressive overclocking), that's what your computer is built for.

Sure, over a couple of decades there's more electromigration than would otherwise have happened at idle temps. But that's pretty much it.

> I think I would need to spend $2000+ to run a decent model locally

Not really. Repurpose second hand parts and you can do it for 1/4 of that cost. It can also be a server and do other things when you aren't running models.

6. lm28469 ◴[16 Jul 25 09:59 UTC] No.44580499{3}[source]▶

>>44573609 #

Electricity isn't free and these things are basically continuously ON bread toasters.

replies(2): >>44582861 #>>44595430 #

7. haiku2077 ◴[16 Jul 25 14:40 UTC] No.44582861{4}[source]▶

>>44580499 #

Not on current hardware. I have an AI voice bot running 24/7 on a Mac Mini in my office (provides services for a dedicated server for a video game) and the amount of power used above idle is minimal.

8. trod1234 ◴[17 Jul 25 16:55 UTC] No.44595430{4}[source]▶

>>44580499 #

It is effectively free when you have a surplus of E- coming from the sun. (PV)

↑