←back to thread

MCP in LM Studio

(lmstudio.ai)
225 points yags | 1 comments | | HN request time: 0.219s | source
Show context
chisleu ◴[] No.44380098[source]
Just ordered a $12k mac studio w/ 512GB of integrated RAM.

Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.

There is another project that people should be aware of: https://github.com/exo-explore/exo

Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.

Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.

replies(14): >>44380196 #>>44380217 #>>44380386 #>>44380596 #>>44380626 #>>44380956 #>>44381072 #>>44381075 #>>44381174 #>>44381177 #>>44381267 #>>44385069 #>>44386056 #>>44387384 #
dchest ◴[] No.44380196[source]
I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to generate summaries and tags for my vibe-coded Bloomberg Terminal-style RSS reader :-) It works fine (the laptop gets hot and slow, but fine).

Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.

replies(1): >>44380381 #
minimaxir ◴[] No.44380381[source]
8 GB of RAM with local LLMs in general is iffy: a 8-bit quantized Qwen3-4B is 4.2GB on disk and likely more in memory. 16 GB is usually the minimum to be able to run decent models without compromising on heavy quantization.
replies(2): >>44382797 #>>44385257 #
1. dchest ◴[] No.44385257[source]
It's 4-bit quantized (Q4_K_M, 2.5 GB) and still works well for this task. It's amazing. I've been running various small models on this 8 GB Air since the first Llama and GPT-J, and they improved so much!

macOS virtual memory works well on swapping in and out stuff to SSD.