Running a 180B parameter LLM on a single Apple M2 Ultra

1. randomopining ◴[07 Sep 23 15:40 UTC] No.37420598[source]▶

>>37419518 (OP) #

Is there any actual usecases to run this stuff on a local computer? Or are most of these models actually suited to run on remote clusters?

replies(3): >>37421417 #>>37421858 #>>37423070 #

2. logicchains ◴[07 Sep 23 16:26 UTC] No.37421417[source]▶

>>37420598 (TP) #

The use-case is you want to generate pornographic, violence-depicting or politically-incorrect content, and would rather buy a powerful computer than rent a server (or you already own a powerful computer).

replies(2): >>37423085 #>>37440704 #

3. acdha ◴[07 Sep 23 16:51 UTC] No.37421858[source]▶

>>37420598 (TP) #

Here’s a simple one: corporate policy doesn’t allow you to send company data to a cloud service. There are a ton of people with significant budgets in that situation.

replies(1): >>37424659 #

4. beardedwizard ◴[07 Sep 23 18:05 UTC] No.37423070[source]▶

>>37420598 (TP) #

Absolutely! Local experimentation. I built a transcription and summarization pipeline for $0. If I want it to be faster, I can move it to beefier hardware. If I fail 1000s of times it still costs me nothing.

Privacy is the second case, I don't want to leak all my great ideas or data to openai or anyone else.

5. beardedwizard ◴[07 Sep 23 18:06 UTC] No.37423085[source]▶

>>37421417 #

You what? You can run smaller and plenty powerful models on a m1 MacBook. Idk what the porn and violence angle is but maybe keep that one to yourself.

replies(1): >>37423271 #

6. logicchains ◴[07 Sep 23 18:17 UTC] No.37423271{3}[source]▶

>>37423085 #

One of the largest use-cases for local LLMs is NSFW chatbots, like DIY Replika, AI girl/boyfriends, as the hosted services are too censored to be used for this. Yes there are smaller models, but they're not as intelligent. Similarly people using LLMs as a writing aid need to use local ones if they're writing a story (or .e.g DnD campaign) involving violence, as the hosted ones are generally unwilling to narrate graphic violence, and the smarter the model, the better the story quality.

Given that censorship is one of the biggest complaints about the hosted LLMs, it should be no surprise that some of the main use-cases driving local LLMs are those involving creating content that censored LLMs are unwilling to create.

7. zamadatix ◴[07 Sep 23 19:51 UTC] No.37424659[source]▶

>>37421858 #

I think that use case still matches the remote cluster use case better as a policy like "We can't use cloud" doesn't mean "we have to use our individual local workstations". This approach really makes sense for the "we have 1-3 people that want to really push this on a budget", beyond that big iron makes more sense. And this still helps with that IMO, it's just one step in getting to there from "only the largest can play".

replies(1): >>37424814 #

8. acdha ◴[07 Sep 23 20:02 UTC] No.37424814{3}[source]▶

>>37424659 #

Maybe, but that’s potentially slower and definitely much more expensive. A lot of people in those environments can get a $6k workstation a lot faster than a compute cluster which has to be supported, secured, etc.

9. catchnear4321 ◴[08 Sep 23 23:46 UTC] No.37440704[source]▶

>>37421417 #

it seems infinitely cheaper to jailbreak poorly implemented publicly-facing gimmick LLM “use cases” and “demonstrations” that rely on / thinly veneer commercial apis.

(this is not financial advice and i am not a financial advisor.)