Ask HN: What's Your Useful Local LLM Stack?

1. clvx ◴[15 Jul 25 17:10 UTC] No.44573449[source]▶

>>44572043 (OP) #

In a related subject, what’s the best hardware to run local LLM’s for this use case? Assuming a budget of no more of $2.5K.

And, is there an open source implementation of an agentic workflow (search tools and others) to use it with local LLM’s?

replies(5): >>44573471 #>>44573605 #>>44573699 #>>44577055 #>>44578000 #

2. seanmcdirmid ◴[15 Jul 25 17:12 UTC] No.44573471[source]▶

>>44573449 (TP) #

I got a M3 max (the higher end one) with 64GB/ram macbook pro a while back for $3k, might be cheaper now now that the M3 ultra is out.

3. haiku2077 ◴[15 Jul 25 17:22 UTC] No.44573605[source]▶

>>44573449 (TP) #

I'm using Zed which supports Ollama on my M4 Macs.

https://zed.dev/blog/fastest-ai-code-editor

4. prettyblocks ◴[15 Jul 25 17:31 UTC] No.44573699[source]▶

>>44573449 (TP) #

You can build a pretty good PC with a used 3090 for that budget. It will outperform anything else in terms of speed. Otherwise, you can get something like an m4 pro mac with 48gb ram.

5. apparent ◴[15 Jul 25 23:35 UTC] No.44577055[source]▶

>>44573449 (TP) #

I've wondered about this also. I have an MBA and like that it's lightweight and relatively cheap. I could buy a MBP and max out the RAM, but I think getting a Mac mini with lots of RAM could actually make more sense. Has anyone set up something like this to make it available to their laptop/iPhone/etc.?

Seems like there would be cost advantages and always-online advantages. And the risk of a desktop computer getting damaged/stolen is much lower than for laptops.

6. dent9 ◴[16 Jul 25 02:06 UTC] No.44578000[source]▶

>>44573449 (TP) #

You can get used RTX 3090 for $750-800 each. Pro tip; look for 2.5 slot sized models line EVGA XC3 or the older blower models. Then you can get two for $1600, fit them in a full size case, 128GB DDR5 for $300, some Ryzen CPU like the 9900X and a mobo and case and PSU to fill up the rest of the budget. If you want to skimp you can drop one of the GPUs until you're sure you need 48GB VRAM and some of the RAM but you really don't save that much. Just make sure you get a case that can fit multiple full size GPU and a mobo that can support it as well. The slot configurations are pretty bad on the AM5 generation for multi GPU. You'll probably end up with a mobo such as Asus ProArt

Also none of this is worth the money because it's simply not possible to run the same kinds of models you pay for online on a standard home system. Things like ChatGPT 4o use more VRAM than you'll ever be able to scrounge up unless your budget is closer to $10,000-25,000+. Think multiple RTX A6000 cards or similar. So ultimately you're better off just paying for the online hosted services

replies(1): >>44580035 #

7. beefnugs ◴[16 Jul 25 08:39 UTC] No.44580035[source]▶

>>44578000 #

I think this proves one of the suckpoints of AI : there are clearly certain things that the smaller models should be fine at... but there doesn't seem to be frameworks or something that constantly analyze and simulate and evaluate what you could be doing with smaller and cheaper models

Of course the economics are completely at odds with any real engineering: nobody wants you to use smaller local models, nobody wants you to consider cost/efficiency saving

replies(1): >>44637393 #

8. satvikpendem ◴[21 Jul 25 16:51 UTC] No.44637393{3}[source]▶

>>44580035 #

> but there doesn't seem to be frameworks or something that constantly analyze and simulate and evaluate what you could be doing with smaller and cheaper models

This is more of a social problem. Read through r/LocalLlama every so often and you'll see how people are optimizing their usage.