Most active commenters

diggan(8)
sneak(4)
chisleu(3)
seanmcdirmid(3)
storus(3)

Popular/hot comments

>>44381177 #
>>44383770 #
>>44380720 #
>>44380956 #
>>44381075 #

←back to thread

MCP in LM Studio

(lmstudio.ai)

1. chisleu ◴[25 Jun 25 17:58 UTC] No.44380098[source]▶

>>44379792 (OP) #

Just ordered a $12k mac studio w/ 512GB of integrated RAM.

Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.

There is another project that people should be aware of: https://github.com/exo-explore/exo

Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.

Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.

replies(14): >>44380196 #>>44380217 #>>44380386 #>>44380596 #>>44380626 #>>44380956 #>>44381072 #>>44381075 #>>44381174 #>>44381177 #>>44381267 #>>44385069 #>>44386056 #>>44387384 #

2. dchest ◴[25 Jun 25 18:07 UTC] No.44380196[source]▶

>>44380098 (TP) #

I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to generate summaries and tags for my vibe-coded Bloomberg Terminal-style RSS reader :-) It works fine (the laptop gets hot and slow, but fine).

Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.

replies(1): >>44380381 #

3. karmakaze ◴[25 Jun 25 18:09 UTC] No.44380217[source]▶

>>44380098 (TP) #

Nice. Ironically well suited for non-Apple Intelligence.

4. minimaxir ◴[25 Jun 25 18:24 UTC] No.44380381[source]▶

>>44380196 #

8 GB of RAM with local LLMs in general is iffy: a 8-bit quantized Qwen3-4B is 4.2GB on disk and likely more in memory. 16 GB is usually the minimum to be able to run decent models without compromising on heavy quantization.

replies(2): >>44382797 #>>44385257 #

5. incognito124 ◴[25 Jun 25 18:24 UTC] No.44380386[source]▶

>>44380098 (TP) #

> I'm going to download it with Safari

Oof you were NOT joking

replies(1): >>44381086 #

6. sneak ◴[25 Jun 25 18:44 UTC] No.44380596[source]▶

>>44380098 (TP) #

I already got one of these. I’m spoiled by Claude 4 Opus; local LLMs are slower and lower quality.

I haven’t been using it much. All it has on it is LM Studio, Ollama, and Stats.app.

> Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

lol, yup. same.

replies(1): >>44380720 #

7. teaearlgraycold ◴[25 Jun 25 18:47 UTC] No.44380626[source]▶

>>44380098 (TP) #

What are you going to do with the LLMs you run?

replies(1): >>44380685 #

8. chisleu ◴[25 Jun 25 18:52 UTC] No.44380685[source]▶

>>44380626 #

Currently I'm using gemini 2.5 and claude 3.7 sonnet for coding tasks.

I'm interested in using models for code generation, but I'm not expecting much in that regard.

I'm planning to attempt fine tuning open source models on certain tool sets, especially MCP tools.

9. chisleu ◴[25 Jun 25 18:56 UTC] No.44380720[source]▶

>>44380596 #

Yup, I'm spoiled by Claude 3.7 Sonnet right now. I had to stop using opus for plan mode in my Agent because it is just so expensive. I'm using Gemini 2.5 pro for that now.

I'm considering ordering one of these today: https://www.newegg.com/p/N82E16816139451?Item=N82E1681613945...

It looks like it will hold 5 GPUs with a single slot open for infiniband

Then local models might be lower quality, but it won't be slow! :)

replies(3): >>44381101 #>>44382010 #>>44384667 #

10. prettyblocks ◴[25 Jun 25 19:20 UTC] No.44380956[source]▶

>>44380098 (TP) #

I've been using openwebui and am pretty happy with it. Why do you like lm studio more?

replies(3): >>44381042 #>>44381073 #>>44381909 #

11. truemotive ◴[25 Jun 25 19:31 UTC] No.44381042[source]▶

>>44380956 #

Open WebUI can leverage the built in web server in LM Studio, just FYI in case you thought it was primarily a chat interface.

12. noman-land ◴[25 Jun 25 19:34 UTC] No.44381072[source]▶

>>44380098 (TP) #

I love LM Studio. It's a great tool. I'm waiting for another generation of Macbook Pros to do as you did :).

13. prophesi ◴[25 Jun 25 19:34 UTC] No.44381073[source]▶

>>44380956 #

Not OP, but with LM Studio I get a chat interface out-of-the-box for local models, while with openwebui I'd need to configure it to point to an OpenAI API-compatible server (like LM Studio). It can also help determine which models will work well with your hardware.

LM Studio isn't FOSS though.

I did enjoy hooking up OpenWebUI to Firefox's experimental AI Chatbot. (browser.ml.chat.hideLocalhost to false, browser.ml.chat.provider to localhost:${openwebui-port})

14. imranq ◴[25 Jun 25 19:35 UTC] No.44381075[source]▶

>>44380098 (TP) #

I'd love to host my own LLMs but I keep getting held back from the quality and affordability of Cloud LLMs. Why go local unless there's private data involved?

replies(3): >>44383336 #>>44385249 #>>44388345 #

15. noman-land ◴[25 Jun 25 19:36 UTC] No.44381086[source]▶

>>44380386 #

Safari to download LM Studio. LM Studio to download models. Models to download Firefox.

replies(1): >>44381629 #

16. kristopolous ◴[25 Jun 25 19:37 UTC] No.44381101{3}[source]▶

>>44380720 #

The GPUs are the hard things to find unless you want to pay like 50% markup

replies(1): >>44384701 #

17. ◴[25 Jun 25 19:43 UTC] No.44381174[source]▶

>>44380098 (TP) #

18. zackify ◴[25 Jun 25 19:44 UTC] No.44381177[source]▶

>>44380098 (TP) #

I love LM studio but I’d never waste 12k like that. The memory bandwidth is too low trust me.

Get the RTX Pro 6000 for 8.5k with double the bandwidth. It will be way better

replies(5): >>44382823 #>>44382833 #>>44383071 #>>44386064 #>>44387179 #

19. teaearlgraycold ◴[25 Jun 25 20:42 UTC] No.44381629{3}[source]▶

>>44381086 #

The modern ninite

20. s1mplicissimus ◴[25 Jun 25 21:15 UTC] No.44381909[source]▶

>>44380956 #

i recently tried openwebui but it was so painful to get it to run with local model. that "first run experience" of lm studio is pretty fire in comparison. can't really talk about actually working with it though, still waiting for the 8GB download

replies(1): >>44382953 #

21. evo_9 ◴[25 Jun 25 21:32 UTC] No.44382010{3}[source]▶

>>44380720 #

I was using Claude 3.7 exclusively for coding, but it sure seems like it got worse suddenly about 2–3 weeks back. It went from writing pretty solid code I had to make only minor changes to, to being completely off its rails, altering files unrelated to my prompt, undoing fixes from the same conversation, reinventing db access and ignoring existing coding 'standards' established in the existing codebase. Became so untrustworthy I finally gave OpenAi O3 a try and honestly, I was pretty surprised how solid it has been. I've been using o3 since, and I find it generally does exactly what I ask, esp if you have a well established project with plenty of code for it to reference.

Just wondering if Claude 3.7 has seemed differently lately for anyone else? Was my go to for several months, and I'm no fan of OpenAI, but o3 has been rock solid.

replies(2): >>44383401 #>>44384695 #

22. hnuser123456 ◴[25 Jun 25 23:34 UTC] No.44382797{3}[source]▶

>>44380381 #

But 8GB of Apple RAM is 16GB of normal RAM.

https://www.pcgamer.com/apple-vp-says-8gb-ram-on-a-macbook-p...

replies(2): >>44383813 #>>44383841 #

23. marci ◴[25 Jun 25 23:38 UTC] No.44382823[source]▶

>>44381177 #

You can't run deepseek-v3/r1 on the RTX Pro 6000, not to mention the upcomming 1 million context qwen models, or the current qwen3-235b.

24. tymscar ◴[25 Jun 25 23:41 UTC] No.44382833[source]▶

>>44381177 #

Why would they pay 2/3 of the price for something with 1/5 of ram?

The whole point of spending that much money for them is to run massive models, like the full R1, which the Pro 6000 cant

replies(1): >>44383770 #

25. prettyblocks ◴[25 Jun 25 23:59 UTC] No.44382953{3}[source]▶

>>44381909 #

Interesting. I run my local llms through ollama and it's zero trouble to get that working in openwebui as long as the ollama server is running.

replies(1): >>44386320 #

26. t1amat ◴[26 Jun 25 00:22 UTC] No.44383071[source]▶

>>44381177 #

(Replying to both siblings questioning this)

If the primary use case is input heavy, which is true of agentic tools, there’s a world where partial GPU offload with many channels of DDR5 system RAM leads to an overall better experience. A good GPU will process input many times faster, and with good RAM you might end up with decent output speed still. Seems like that would come in close to $12k?

And there would be no competition for models that do fit entirely inside that VRAM, for example Qwen3 32B.

27. mycall ◴[26 Jun 25 01:11 UTC] No.44383336[source]▶

>>44381075 #

Offline is another use case.

replies(1): >>44383597 #

28. jessmartin ◴[26 Jun 25 01:25 UTC] No.44383401{4}[source]▶

>>44382010 #

Could be the prompt and/or tool descriptions in whatever tool you are using Claude in that degraded. Have definitely noticed variance across Cursor, Claude Code, etc even with the exact same models.

Prompts + tools matter.

replies(1): >>44385534 #

29. seanmcdirmid ◴[26 Jun 25 02:03 UTC] No.44383597{3}[source]▶

>>44383336 #

Nothing like playing around with LLMs on an airplane without an internet connection.

replies(2): >>44383945 #>>44388368 #

30. zackify ◴[26 Jun 25 02:39 UTC] No.44383770{3}[source]▶

>>44382833 #

Because waiting forever for initial prompt processing with realistic number of MCP tools enabled on a prompt is going to suck without the most bandwidth possible

And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

If you’re using it for background tasks and not coding it’s a different story

replies(5): >>44384804 #>>44385388 #>>44386018 #>>44386069 #>>44388078 #

31. arrty88 ◴[26 Jun 25 02:49 UTC] No.44383813{4}[source]▶

>>44382797 #

I concur. I just upgraded from m1 air with 8gb to m4 with 24gb. Excited to run bigger models.

replies(1): >>44386303 #

32. minimaxir ◴[26 Jun 25 02:59 UTC] No.44383841{4}[source]▶

>>44382797 #

Interestingly it was AI (Apple Intelligence) that was the primary reason Apple abandoned that hedge.

33. asteroidburger ◴[26 Jun 25 03:30 UTC] No.44383945{4}[source]▶

>>44383597 #

If I can afford a seat above economy with room to actually, comfortably work on a laptop, I can afford the couple bucks for wifi for the flight.

replies(2): >>44384251 #>>44388091 #

34. seanmcdirmid ◴[26 Jun 25 04:44 UTC] No.44384251{5}[source]▶

>>44383945 #

If you are assuming that your Hainan airlines flight has wifi that isn't behind the GFW, even outside of cattle class, I have some news for you...

replies(1): >>44384457 #

35. sach1 ◴[26 Jun 25 05:33 UTC] No.44384457{6}[source]▶

>>44384251 #

Getting around the GFW is trivially easy.

replies(1): >>44389173 #

36. sneak ◴[26 Jun 25 06:18 UTC] No.44384667{3}[source]▶

>>44380720 #

I’m firehosing about $1k/mo at Cursor on pay-as-you-go and am happy to do it (it’s delivering 2-10k of value each month).

What cards are you gonna put in that chassis?

37. sneak ◴[26 Jun 25 06:25 UTC] No.44384695{4}[source]▶

>>44382010 #

Me too. (re: Claude; I haven’t switched models.) It sucks because I was happily paying >$1k/mo in usage charges and then it all went south.

38. sneak ◴[26 Jun 25 06:26 UTC] No.44384701{4}[source]▶

>>44381101 #

That’s just what they cost; MSRP is irrelevant. They’re not hard to find, they’re just expensive.

39. johndough ◴[26 Jun 25 06:44 UTC] No.44384804{4}[source]▶

>>44383770 #

If the MPC tools come first in the conversation, it should be technically possible to cache the activations, so you do not have to recompute them each time.

40. PeterStuer ◴[26 Jun 25 08:06 UTC] No.44385249[source]▶

>>44381075 #

Same. For 'sovereignty ' reasons I eventually will move to local processing, but for now in development/prototyping the gap with hosted LLM's seems too wide.

41. dchest ◴[26 Jun 25 08:07 UTC] No.44385257{3}[source]▶

>>44380381 #

It's 4-bit quantized (Q4_K_M, 2.5 GB) and still works well for this task. It's amazing. I've been running various small models on this 8 GB Air since the first Llama and GPT-J, and they improved so much!

macOS virtual memory works well on swapping in and out stuff to SSD.

42. pests ◴[26 Jun 25 08:31 UTC] No.44385388{4}[source]▶

>>44383770 #

Initial prompt processing with a large static context (system prompt + tools + whatever) could technically be improved by checkpointing the model state and reusing for future prompts. Not sure if any tools support this.

43. esskay ◴[26 Jun 25 09:01 UTC] No.44385534{5}[source]▶

>>44383401 #

Cursor became awful over the last few weeks so it's likely them, no idea what they did to their prompt but its just been incredibly poor at most tasks regardless of which model you pick.

44. tucnak ◴[26 Jun 25 10:35 UTC] No.44386018{4}[source]▶

>>44383770 #

https://docs.vllm.ai/projects/production-stack/en/latest/tut...

45. storus ◴[26 Jun 25 10:42 UTC] No.44386056[source]▶

>>44380098 (TP) #

If the rumors about splitting CPU/GPU in new Macs are true, your MacStudio will be the last one capable of running DeepSeek R1 671B Q4. It looks like Apple had an accidental winner that will go away with the end of unified RAM.

replies(1): >>44387131 #

46. storus ◴[26 Jun 25 10:43 UTC] No.44386064[source]▶

>>44381177 #

RTX Pro 6000 can't do DeepSeek R1 671B Q4, you'd need 5-6 of them, which makes it way more expensive. Moreover, MacStudio will do it at 150W whereas Pro 6000 would start at 1500W.

replies(1): >>44386270 #

47. storus ◴[26 Jun 25 10:44 UTC] No.44386069{4}[source]▶

>>44383770 #

M3 Ultra GPU is around 3070-3080 for the initial token processing. Not great, not terrible.

48. diggan ◴[26 Jun 25 11:21 UTC] No.44386270{3}[source]▶

>>44386064 #

> Moreover, MacStudio will do it at 150W whereas Pro 6000 would start at 1500W.

No, Pro 6000 pulls max 600W, not sure where you get 1500W from, that's more than double the specification.

Besides, what is the token/second or second/token, and prompt processing speed for running DeepSeek R1 671B on a Mac Studio with Q4? Curious about those numbers, because I have a feeling they're very far off each other.

49. diggan ◴[26 Jun 25 11:26 UTC] No.44386303{5}[source]▶

>>44383813 #

> m4 with 24gb

Wow, that is probably analogous to 48GB on other systems then, if we were to ask an Apple VP?

50. diggan ◴[26 Jun 25 11:28 UTC] No.44386320{4}[source]▶

>>44382953 #

I think that's the thing. Compared to LM Studio, just running Ollama (fiddling around with terminals) is more complicated than the full E2E of chatting with LM Studio.

Of course, for folks used to terminals, daemons and so on it makes sense from the get go, but for others it seemingly doesn't, and it doesn't help that Ollama refuses to communicate what people should understand before trying to use it.

51. phren0logy ◴[26 Jun 25 13:15 UTC] No.44387131[source]▶

>>44386056 #

I have not heard this rumor. Source?

replies(1): >>44387443 #

52. smcleod ◴[26 Jun 25 13:23 UTC] No.44387179[source]▶

>>44381177 #

RTX is nice, but it's memory limited and requires to have a full desktop machine to run it in. I'd take slower inference (as long as it's not less than 15tk/s) for more memory any day!

replies(1): >>44388281 #

53. whatevsmate ◴[26 Jun 25 13:47 UTC] No.44387384[source]▶

>>44380098 (TP) #

I did this a month ago and don't regret it one bit. I had a long laundry list of ML "stuff" I wanted to play with or questions to answer. There's no world in which I'm paying by the request, or token, or whatever, for hacking on fun projects. Keeping an eye on the meter is the opposite of having fun and I have absolutely nowhere I can put a loud, hot GPU (that probably has "gamer" lighting no less) in my fam's small apartment.

54. prophesi ◴[26 Jun 25 13:54 UTC] No.44387443{3}[source]▶

>>44387131 #

I believe they're talking about the rumors by an Apple supply chain analyst, Ming-Chi Kuo.

https://www.techspot.com/news/106159-apple-m5-silicon-rumore...

replies(1): >>44388382 #

55. MangoToupe ◴[26 Jun 25 14:56 UTC] No.44388078{4}[source]▶

>>44383770 #

> And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

Am I the only person that gives aider instructions and leaves it alone for a few hours? This doesn't seem that difficult to integrate into my workflow.

replies(1): >>44388244 #

56. MangoToupe ◴[26 Jun 25 14:58 UTC] No.44388091{5}[source]▶

>>44383945 #

Woah there Mr Money, slow down with these assumptions. A computer is worth the investment. But paying a cent extra to airlines? Unacceptable.

57. diggan ◴[26 Jun 25 15:17 UTC] No.44388244{5}[source]▶

>>44388078 #

> Am I the only person that gives aider instructions and leaves it alone for a few hours?

Probably not, but in my experience, if it takes longer than 10-15 minutes it's either stuck in a loop or down the wrong rabbit hole. But I don't use it for vibe coding or anything "big scope" like that, but more focused changes/refactors so YMMV

58. diggan ◴[26 Jun 25 15:21 UTC] No.44388281{3}[source]▶

>>44387179 #

I'd love to see more Very-Large-Memory Mac Studio benchmarks for prompt processing and inference. The few benchmarks I've seem either missed to take prompt processing into account, didn't share exact weights+setup that were used or showed really abysmal performance.

59. diggan ◴[26 Jun 25 15:27 UTC] No.44388345[source]▶

>>44381075 #

There are some use cases I use LLMs for where I don't care a lot about the data being private (although that's a plus) but I don't want to pay XXX€ for classifying some data and I particularly don't want to worry about having to pay that again if I want to redo it with some changes.

Using local LLMs for this I don't worry about the price at all, I can leave it doing three tries per "task" without tripling the cost if I wanted to.

It's true that there is an upfront cost but way easier to get over that hump than on-demand/per-token costs, at least for me.

60. diggan ◴[26 Jun 25 15:29 UTC] No.44388368{4}[source]▶

>>44383597 #

Some of us don't have the most reliable ISPs or even network infrastructure, and I say that as someone who lives in Spain :) I live outside a huge metropolitan area and Vodafone fiber went down twice this year, not even counting the time the country's electricity grid was down for like 24 hours.

61. diggan ◴[26 Jun 25 15:31 UTC] No.44388382{4}[source]▶

>>44387443 #

Seems Apple is waking up to the fact that if it's too easy to run weights locally, there really isn't much sense to having their own remote inference endpoints, so time to stop the party :)

62. seanmcdirmid ◴[26 Jun 25 16:59 UTC] No.44389173{7}[source]▶

>>44384457 #

ya ya, just buy a VPN, pay the yearly subscription, and then have them disappear the week after you paid. Super trivially frustrating.

↑