Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 1 comments | 07 Sep 23 14:36 UTC | HN request time: 0s | source

Show context

adam_arthur ◴[07 Sep 23 15:32 UTC] No.37420461[source]▶

>>37419518 (OP) #

Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.

Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.

It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.

Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

replies(8): >>37420537 #>>37420948 #>>37421196 #>>37421214 #>>37421497 #>>37421862 #>>37421945 #>>37424918 #

MuffinFlavored ◴[07 Sep 23 16:14 UTC] No.37421214[source]▶

>>37420461 #

> Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

Unless I'm misunderstanding, doesn't OpenAI have a very vested interest to keep making their products so good/so complex/so large that consumer hobbyists can't just `git clone` an alternative that's 95% as good running locally?

replies(3): >>37421454 #>>37421498 #>>37421783 #

chongli ◴[07 Sep 23 16:30 UTC] No.37421498[source]▶

>>37421214 #

What is OpenAI's moat? Loads of people outside the company are working on alternative models. They may have a lead right now but will it last a few years? Will it even last 6 months?

replies(4): >>37421647 #>>37421649 #>>37421665 #>>37422380 #

MuffinFlavored ◴[07 Sep 23 16:39 UTC] No.37421649[source]▶

>>37421498 #

> What is OpenAI's moat?

From what I understand, if you take the absolute best cutting edge LLM with the most parameters and the most up to date model from GitHub/HuggingFace/whatever, it's very far off from the output you get from GPT-3.5 / GPT-4

aka full of hallucinations, not very useful

I don't know if this is the right way to look at it but if what George Hotz said about GPT-4 simply being "8 220B parameter models glued together by something called a mixture-of-experts", from what I understand, OpenAI's moat is:

their access/subsidiized cost to GPUs/infrastructure with Microsoft

the 8 220B models they have are really good/I don't think anything open source matches them/nobody can download "all of Reddit/Twitter/Wikipedia/StackOverflow/whatever else they trained on" anymore like they could given how everybody wants to protect/monetize their content now

and then the "router" / "MoE" piece seems to be something missing from open source offerings as well

replies(3): >>37421962 #>>37422981 #>>37426850 #

1. nmfisher ◴[07 Sep 23 23:01 UTC] No.37426850[source]▶

>>37421649 #

This isn’t really true (or at least, doesn’t apply across the board). Qwen (Alibaba’s open source model) outperforms GPT4 on Chinese language tasks, and I can further finetune it for my own tasks (which I’ve done, and I confirm it’s produces more natural output than GPT4).

Other benchmarks/anecdotes suggest fine-tuned code models are outperforming GPT4 too. The trend seems to be that smaller, fine-tuned task specific models outperform larger generalised models. It requires a lot of resources to pretrain the base model, but as we’ve seen, there’s no shortage of companies who are willing and able to do that.

Not to mention, all those other companies are already profitable, whereas OpenAI is already burning investor cash.

↑