MCP in LM Studio

(lmstudio.ai)

Show context

chisleu ◴[25 Jun 25 17:58 UTC] No.44380098[source]▶

Just ordered a $12k mac studio w/ 512GB of integrated RAM.

Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.

There is another project that people should be aware of: https://github.com/exo-explore/exo

Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.

Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.

replies(14): >>44380196 #>>44380217 #>>44380386 #>>44380596 #>>44380626 #>>44380956 #>>44381072 #>>44381075 #>>44381174 #>>44381177 #>>44381267 #>>44385069 #>>44386056 #>>44387384 #

zackify ◴[25 Jun 25 19:44 UTC] No.44381177[source]▶

>>44380098 #

I love LM studio but I’d never waste 12k like that. The memory bandwidth is too low trust me.

Get the RTX Pro 6000 for 8.5k with double the bandwidth. It will be way better

replies(5): >>44382823 #>>44382833 #>>44383071 #>>44386064 #>>44387179 #

tymscar ◴[25 Jun 25 23:41 UTC] No.44382833[source]▶

>>44381177 #

Why would they pay 2/3 of the price for something with 1/5 of ram?

The whole point of spending that much money for them is to run massive models, like the full R1, which the Pro 6000 cant

replies(1): >>44383770 #

1. zackify ◴[26 Jun 25 02:39 UTC] No.44383770[source]▶

>>44382833 #

Because waiting forever for initial prompt processing with realistic number of MCP tools enabled on a prompt is going to suck without the most bandwidth possible

And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

If you’re using it for background tasks and not coding it’s a different story

replies(5): >>44384804 #>>44385388 #>>44386018 #>>44386069 #>>44388078 #

2. johndough ◴[26 Jun 25 06:44 UTC] No.44384804[source]▶

>>44383770 (TP) #

If the MPC tools come first in the conversation, it should be technically possible to cache the activations, so you do not have to recompute them each time.

3. pests ◴[26 Jun 25 08:31 UTC] No.44385388[source]▶

>>44383770 (TP) #

Initial prompt processing with a large static context (system prompt + tools + whatever) could technically be improved by checkpointing the model state and reusing for future prompts. Not sure if any tools support this.

4. tucnak ◴[26 Jun 25 10:35 UTC] No.44386018[source]▶

>>44383770 (TP) #

https://docs.vllm.ai/projects/production-stack/en/latest/tut...

5. storus ◴[26 Jun 25 10:44 UTC] No.44386069[source]▶

>>44383770 (TP) #

M3 Ultra GPU is around 3070-3080 for the initial token processing. Not great, not terrible.

6. MangoToupe ◴[26 Jun 25 14:56 UTC] No.44388078[source]▶

>>44383770 (TP) #

> And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

Am I the only person that gives aider instructions and leaves it alone for a few hours? This doesn't seem that difficult to integrate into my workflow.

replies(1): >>44388244 #

7. diggan ◴[26 Jun 25 15:17 UTC] No.44388244[source]▶

>>44388078 #

> Am I the only person that gives aider instructions and leaves it alone for a few hours?

Probably not, but in my experience, if it takes longer than 10-15 minutes it's either stuck in a loop or down the wrong rabbit hole. But I don't use it for vibe coding or anything "big scope" like that, but more focused changes/refactors so YMMV

↑