←back to thread

91 points Olshansky | 8 comments | | HN request time: 1.123s | source | bottom

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.

My daily (remote) LLM stack:

  - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.

  - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.

  - ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.
Here’s what I’ve cobbled together for my local stack so far:

Tools

  - Ollama: for running models locally

  - Aider: Claude-code-style CLI interface

  - VSCode w/ continue.dev extension: local chat & autocomplete
Models

  - Chat: llama3.1:latest

  - Autocomplete: Qwen2.5 Coder 1.5B

  - Coding/Editing: deepseek-coder-v2:16b
Things I’m not worried about:

  - CPU/Memory (running on an M1 MacBook)

  - Cost (within reason)

  - Data privacy / being trained on (not trying to start a philosophical debate here)
I am worried about:

  - Actual usefulness (i.e. “vibes”)

  - Ease of use (tools that fit with my muscle memory)

  - Correctness (not benchmarks)

  - Latency & speed
Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.

---

Who I am

  - CTO of a small startup (5 amazing engineers)

  - 20 years of coding (since I was 13)

  - Ex-big tech
Show context
ashwinsundar ◴[] No.44573186[source]
I just go outside when my internet is down for 15 minutes a year. Or tether to my cell phone plan if the need is urgent.

I don't see the point of a local AI stack, outside of privacy or some ethical concerns (which a local stack doesn't solve anyway imo). I also *only* have 24GB of RAM on my laptop, which it sounds like isn't enough to run any of the best models. Am I missing something by not upgrading and running a high-performance LLM on my machine?

replies(1): >>44573265 #
1. filchermcurr ◴[] No.44573265[source]
I would say cost is a factor. Maybe not for OP, but many people aren't able to spend $135 a month on AI services.
replies(1): >>44573407 #
2. ashwinsundar ◴[] No.44573407[source]
Does the cost of a new computer not get factored in? I think I would need to spend $2000+ to run a decent model locally, and even then I can only run open source models

Not to mention, running a giant model locally for hours a day is sure to shorten the lifespan of the machine…

replies(3): >>44573609 #>>44573634 #>>44574438 #
3. dpoloncsak ◴[] No.44573609[source]
$2000 for a new machine is only a little over a year in AI costs for OP
replies(1): >>44580499 #
4. haiku2077 ◴[] No.44573634[source]
The computer is a general purpose tool, though. You can play games, edit video and images, and self-host a movie/TV collection with real time transcoding with the same hardware. Many people have powerful PCs for playing games and running professional creative software already.

There's no reason running a model would shorten a machine's lifespan. PSUs, CPUs, motherboards, GPUs and RAM will all be long obsolete before they wear out even under full load. At worst you might have to swap thermal paste/pads a couple of years sooner. (A tube of paste is like, ten bucks.)

5. outworlder ◴[] No.44574438[source]
> Not to mention, running a giant model locally for hours a day is sure to shorten the lifespan of the machine…

That is not a thing. Unless there's something wrong (badly managed thermals, an undersized PSU at the limit of its capacity, dusty unfiltered air clogging fans, aggressive overclocking), that's what your computer is built for.

Sure, over a couple of decades there's more electromigration than would otherwise have happened at idle temps. But that's pretty much it.

> I think I would need to spend $2000+ to run a decent model locally

Not really. Repurpose second hand parts and you can do it for 1/4 of that cost. It can also be a server and do other things when you aren't running models.

6. lm28469 ◴[] No.44580499{3}[source]
Electricity isn't free and these things are basically continuously ON bread toasters.
replies(2): >>44582861 #>>44595430 #
7. haiku2077 ◴[] No.44582861{4}[source]
Not on current hardware. I have an AI voice bot running 24/7 on a Mac Mini in my office (provides services for a dedicated server for a video game) and the amount of power used above idle is minimal.
8. trod1234 ◴[] No.44595430{4}[source]
It is effectively free when you have a surplus of E- coming from the sun. (PV)