Most active commenters
  • diggan(8)
  • sneak(4)
  • chisleu(3)
  • seanmcdirmid(3)
  • storus(3)

←back to thread

MCP in LM Studio

(lmstudio.ai)
225 points yags | 62 comments | | HN request time: 1.23s | source | bottom
1. chisleu ◴[] No.44380098[source]
Just ordered a $12k mac studio w/ 512GB of integrated RAM.

Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.

There is another project that people should be aware of: https://github.com/exo-explore/exo

Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.

Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.

replies(14): >>44380196 #>>44380217 #>>44380386 #>>44380596 #>>44380626 #>>44380956 #>>44381072 #>>44381075 #>>44381174 #>>44381177 #>>44381267 #>>44385069 #>>44386056 #>>44387384 #
2. dchest ◴[] No.44380196[source]
I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to generate summaries and tags for my vibe-coded Bloomberg Terminal-style RSS reader :-) It works fine (the laptop gets hot and slow, but fine).

Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.

replies(1): >>44380381 #
3. karmakaze ◴[] No.44380217[source]
Nice. Ironically well suited for non-Apple Intelligence.
4. minimaxir ◴[] No.44380381[source]
8 GB of RAM with local LLMs in general is iffy: a 8-bit quantized Qwen3-4B is 4.2GB on disk and likely more in memory. 16 GB is usually the minimum to be able to run decent models without compromising on heavy quantization.
replies(2): >>44382797 #>>44385257 #
5. incognito124 ◴[] No.44380386[source]
> I'm going to download it with Safari

Oof you were NOT joking

replies(1): >>44381086 #
6. sneak ◴[] No.44380596[source]
I already got one of these. I’m spoiled by Claude 4 Opus; local LLMs are slower and lower quality.

I haven’t been using it much. All it has on it is LM Studio, Ollama, and Stats.app.

> Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

lol, yup. same.

replies(1): >>44380720 #
7. teaearlgraycold ◴[] No.44380626[source]
What are you going to do with the LLMs you run?
replies(1): >>44380685 #
8. chisleu ◴[] No.44380685[source]
Currently I'm using gemini 2.5 and claude 3.7 sonnet for coding tasks.

I'm interested in using models for code generation, but I'm not expecting much in that regard.

I'm planning to attempt fine tuning open source models on certain tool sets, especially MCP tools.

9. chisleu ◴[] No.44380720[source]
Yup, I'm spoiled by Claude 3.7 Sonnet right now. I had to stop using opus for plan mode in my Agent because it is just so expensive. I'm using Gemini 2.5 pro for that now.

I'm considering ordering one of these today: https://www.newegg.com/p/N82E16816139451?Item=N82E1681613945...

It looks like it will hold 5 GPUs with a single slot open for infiniband

Then local models might be lower quality, but it won't be slow! :)

replies(3): >>44381101 #>>44382010 #>>44384667 #
10. prettyblocks ◴[] No.44380956[source]
I've been using openwebui and am pretty happy with it. Why do you like lm studio more?
replies(3): >>44381042 #>>44381073 #>>44381909 #
11. truemotive ◴[] No.44381042[source]
Open WebUI can leverage the built in web server in LM Studio, just FYI in case you thought it was primarily a chat interface.
12. noman-land ◴[] No.44381072[source]
I love LM Studio. It's a great tool. I'm waiting for another generation of Macbook Pros to do as you did :).
13. prophesi ◴[] No.44381073[source]
Not OP, but with LM Studio I get a chat interface out-of-the-box for local models, while with openwebui I'd need to configure it to point to an OpenAI API-compatible server (like LM Studio). It can also help determine which models will work well with your hardware.

LM Studio isn't FOSS though.

I did enjoy hooking up OpenWebUI to Firefox's experimental AI Chatbot. (browser.ml.chat.hideLocalhost to false, browser.ml.chat.provider to localhost:${openwebui-port})

14. imranq ◴[] No.44381075[source]
I'd love to host my own LLMs but I keep getting held back from the quality and affordability of Cloud LLMs. Why go local unless there's private data involved?
replies(3): >>44383336 #>>44385249 #>>44388345 #
15. noman-land ◴[] No.44381086[source]
Safari to download LM Studio. LM Studio to download models. Models to download Firefox.
replies(1): >>44381629 #
16. kristopolous ◴[] No.44381101{3}[source]
The GPUs are the hard things to find unless you want to pay like 50% markup
replies(1): >>44384701 #
17. ◴[] No.44381174[source]
18. zackify ◴[] No.44381177[source]
I love LM studio but I’d never waste 12k like that. The memory bandwidth is too low trust me.

Get the RTX Pro 6000 for 8.5k with double the bandwidth. It will be way better

replies(5): >>44382823 #>>44382833 #>>44383071 #>>44386064 #>>44387179 #
19. teaearlgraycold ◴[] No.44381629{3}[source]
The modern ninite
20. s1mplicissimus ◴[] No.44381909[source]
i recently tried openwebui but it was so painful to get it to run with local model. that "first run experience" of lm studio is pretty fire in comparison. can't really talk about actually working with it though, still waiting for the 8GB download
replies(1): >>44382953 #
21. evo_9 ◴[] No.44382010{3}[source]
I was using Claude 3.7 exclusively for coding, but it sure seems like it got worse suddenly about 2–3 weeks back. It went from writing pretty solid code I had to make only minor changes to, to being completely off its rails, altering files unrelated to my prompt, undoing fixes from the same conversation, reinventing db access and ignoring existing coding 'standards' established in the existing codebase. Became so untrustworthy I finally gave OpenAi O3 a try and honestly, I was pretty surprised how solid it has been. I've been using o3 since, and I find it generally does exactly what I ask, esp if you have a well established project with plenty of code for it to reference.

Just wondering if Claude 3.7 has seemed differently lately for anyone else? Was my go to for several months, and I'm no fan of OpenAI, but o3 has been rock solid.

replies(2): >>44383401 #>>44384695 #
22. hnuser123456 ◴[] No.44382797{3}[source]
But 8GB of Apple RAM is 16GB of normal RAM.

https://www.pcgamer.com/apple-vp-says-8gb-ram-on-a-macbook-p...

replies(2): >>44383813 #>>44383841 #
23. marci ◴[] No.44382823[source]
You can't run deepseek-v3/r1 on the RTX Pro 6000, not to mention the upcomming 1 million context qwen models, or the current qwen3-235b.
24. tymscar ◴[] No.44382833[source]
Why would they pay 2/3 of the price for something with 1/5 of ram?

The whole point of spending that much money for them is to run massive models, like the full R1, which the Pro 6000 cant

replies(1): >>44383770 #
25. prettyblocks ◴[] No.44382953{3}[source]
Interesting. I run my local llms through ollama and it's zero trouble to get that working in openwebui as long as the ollama server is running.
replies(1): >>44386320 #
26. t1amat ◴[] No.44383071[source]
(Replying to both siblings questioning this)

If the primary use case is input heavy, which is true of agentic tools, there’s a world where partial GPU offload with many channels of DDR5 system RAM leads to an overall better experience. A good GPU will process input many times faster, and with good RAM you might end up with decent output speed still. Seems like that would come in close to $12k?

And there would be no competition for models that do fit entirely inside that VRAM, for example Qwen3 32B.

27. mycall ◴[] No.44383336[source]
Offline is another use case.
replies(1): >>44383597 #
28. jessmartin ◴[] No.44383401{4}[source]
Could be the prompt and/or tool descriptions in whatever tool you are using Claude in that degraded. Have definitely noticed variance across Cursor, Claude Code, etc even with the exact same models.

Prompts + tools matter.

replies(1): >>44385534 #
29. seanmcdirmid ◴[] No.44383597{3}[source]
Nothing like playing around with LLMs on an airplane without an internet connection.
replies(2): >>44383945 #>>44388368 #
30. zackify ◴[] No.44383770{3}[source]
Because waiting forever for initial prompt processing with realistic number of MCP tools enabled on a prompt is going to suck without the most bandwidth possible

And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

If you’re using it for background tasks and not coding it’s a different story

replies(5): >>44384804 #>>44385388 #>>44386018 #>>44386069 #>>44388078 #
31. arrty88 ◴[] No.44383813{4}[source]
I concur. I just upgraded from m1 air with 8gb to m4 with 24gb. Excited to run bigger models.
replies(1): >>44386303 #
32. minimaxir ◴[] No.44383841{4}[source]
Interestingly it was AI (Apple Intelligence) that was the primary reason Apple abandoned that hedge.
33. asteroidburger ◴[] No.44383945{4}[source]
If I can afford a seat above economy with room to actually, comfortably work on a laptop, I can afford the couple bucks for wifi for the flight.
replies(2): >>44384251 #>>44388091 #
34. seanmcdirmid ◴[] No.44384251{5}[source]
If you are assuming that your Hainan airlines flight has wifi that isn't behind the GFW, even outside of cattle class, I have some news for you...
replies(1): >>44384457 #
35. sach1 ◴[] No.44384457{6}[source]
Getting around the GFW is trivially easy.
replies(1): >>44389173 #
36. sneak ◴[] No.44384667{3}[source]
I’m firehosing about $1k/mo at Cursor on pay-as-you-go and am happy to do it (it’s delivering 2-10k of value each month).

What cards are you gonna put in that chassis?

37. sneak ◴[] No.44384695{4}[source]
Me too. (re: Claude; I haven’t switched models.) It sucks because I was happily paying >$1k/mo in usage charges and then it all went south.
38. sneak ◴[] No.44384701{4}[source]
That’s just what they cost; MSRP is irrelevant. They’re not hard to find, they’re just expensive.
39. johndough ◴[] No.44384804{4}[source]
If the MPC tools come first in the conversation, it should be technically possible to cache the activations, so you do not have to recompute them each time.
40. PeterStuer ◴[] No.44385249[source]
Same. For 'sovereignty ' reasons I eventually will move to local processing, but for now in development/prototyping the gap with hosted LLM's seems too wide.
41. dchest ◴[] No.44385257{3}[source]
It's 4-bit quantized (Q4_K_M, 2.5 GB) and still works well for this task. It's amazing. I've been running various small models on this 8 GB Air since the first Llama and GPT-J, and they improved so much!

macOS virtual memory works well on swapping in and out stuff to SSD.

42. pests ◴[] No.44385388{4}[source]
Initial prompt processing with a large static context (system prompt + tools + whatever) could technically be improved by checkpointing the model state and reusing for future prompts. Not sure if any tools support this.
43. esskay ◴[] No.44385534{5}[source]
Cursor became awful over the last few weeks so it's likely them, no idea what they did to their prompt but its just been incredibly poor at most tasks regardless of which model you pick.
44. tucnak ◴[] No.44386018{4}[source]
https://docs.vllm.ai/projects/production-stack/en/latest/tut...
45. storus ◴[] No.44386056[source]
If the rumors about splitting CPU/GPU in new Macs are true, your MacStudio will be the last one capable of running DeepSeek R1 671B Q4. It looks like Apple had an accidental winner that will go away with the end of unified RAM.
replies(1): >>44387131 #
46. storus ◴[] No.44386064[source]
RTX Pro 6000 can't do DeepSeek R1 671B Q4, you'd need 5-6 of them, which makes it way more expensive. Moreover, MacStudio will do it at 150W whereas Pro 6000 would start at 1500W.
replies(1): >>44386270 #
47. storus ◴[] No.44386069{4}[source]
M3 Ultra GPU is around 3070-3080 for the initial token processing. Not great, not terrible.
48. diggan ◴[] No.44386270{3}[source]
> Moreover, MacStudio will do it at 150W whereas Pro 6000 would start at 1500W.

No, Pro 6000 pulls max 600W, not sure where you get 1500W from, that's more than double the specification.

Besides, what is the token/second or second/token, and prompt processing speed for running DeepSeek R1 671B on a Mac Studio with Q4? Curious about those numbers, because I have a feeling they're very far off each other.

49. diggan ◴[] No.44386303{5}[source]
> m4 with 24gb

Wow, that is probably analogous to 48GB on other systems then, if we were to ask an Apple VP?

50. diggan ◴[] No.44386320{4}[source]
I think that's the thing. Compared to LM Studio, just running Ollama (fiddling around with terminals) is more complicated than the full E2E of chatting with LM Studio.

Of course, for folks used to terminals, daemons and so on it makes sense from the get go, but for others it seemingly doesn't, and it doesn't help that Ollama refuses to communicate what people should understand before trying to use it.

51. phren0logy ◴[] No.44387131[source]
I have not heard this rumor. Source?
replies(1): >>44387443 #
52. smcleod ◴[] No.44387179[source]
RTX is nice, but it's memory limited and requires to have a full desktop machine to run it in. I'd take slower inference (as long as it's not less than 15tk/s) for more memory any day!
replies(1): >>44388281 #
53. whatevsmate ◴[] No.44387384[source]
I did this a month ago and don't regret it one bit. I had a long laundry list of ML "stuff" I wanted to play with or questions to answer. There's no world in which I'm paying by the request, or token, or whatever, for hacking on fun projects. Keeping an eye on the meter is the opposite of having fun and I have absolutely nowhere I can put a loud, hot GPU (that probably has "gamer" lighting no less) in my fam's small apartment.
54. prophesi ◴[] No.44387443{3}[source]
I believe they're talking about the rumors by an Apple supply chain analyst, Ming-Chi Kuo.

https://www.techspot.com/news/106159-apple-m5-silicon-rumore...

replies(1): >>44388382 #
55. MangoToupe ◴[] No.44388078{4}[source]
> And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

Am I the only person that gives aider instructions and leaves it alone for a few hours? This doesn't seem that difficult to integrate into my workflow.

replies(1): >>44388244 #
56. MangoToupe ◴[] No.44388091{5}[source]
Woah there Mr Money, slow down with these assumptions. A computer is worth the investment. But paying a cent extra to airlines? Unacceptable.
57. diggan ◴[] No.44388244{5}[source]
> Am I the only person that gives aider instructions and leaves it alone for a few hours?

Probably not, but in my experience, if it takes longer than 10-15 minutes it's either stuck in a loop or down the wrong rabbit hole. But I don't use it for vibe coding or anything "big scope" like that, but more focused changes/refactors so YMMV

58. diggan ◴[] No.44388281{3}[source]
I'd love to see more Very-Large-Memory Mac Studio benchmarks for prompt processing and inference. The few benchmarks I've seem either missed to take prompt processing into account, didn't share exact weights+setup that were used or showed really abysmal performance.
59. diggan ◴[] No.44388345[source]
There are some use cases I use LLMs for where I don't care a lot about the data being private (although that's a plus) but I don't want to pay XXX€ for classifying some data and I particularly don't want to worry about having to pay that again if I want to redo it with some changes.

Using local LLMs for this I don't worry about the price at all, I can leave it doing three tries per "task" without tripling the cost if I wanted to.

It's true that there is an upfront cost but way easier to get over that hump than on-demand/per-token costs, at least for me.

60. diggan ◴[] No.44388368{4}[source]
Some of us don't have the most reliable ISPs or even network infrastructure, and I say that as someone who lives in Spain :) I live outside a huge metropolitan area and Vodafone fiber went down twice this year, not even counting the time the country's electricity grid was down for like 24 hours.
61. diggan ◴[] No.44388382{4}[source]
Seems Apple is waking up to the fact that if it's too easy to run weights locally, there really isn't much sense to having their own remote inference endpoints, so time to stop the party :)
62. seanmcdirmid ◴[] No.44389173{7}[source]
ya ya, just buy a VPN, pay the yearly subscription, and then have them disappear the week after you paid. Super trivially frustrating.