Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 2 comments | 07 Sep 23 14:36 UTC | HN request time: 0s | source

Show context

adam_arthur ◴[07 Sep 23 15:32 UTC] No.37420461[source]▶

>>37419518 (OP) #

Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.

Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.

It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.

Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

replies(8): >>37420537 #>>37420948 #>>37421196 #>>37421214 #>>37421497 #>>37421862 #>>37421945 #>>37424918 #

MuffinFlavored ◴[07 Sep 23 16:14 UTC] No.37421214[source]▶

>>37420461 #

> Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

Unless I'm misunderstanding, doesn't OpenAI have a very vested interest to keep making their products so good/so complex/so large that consumer hobbyists can't just `git clone` an alternative that's 95% as good running locally?

replies(3): >>37421454 #>>37421498 #>>37421783 #

chongli ◴[07 Sep 23 16:30 UTC] No.37421498[source]▶

>>37421214 #

What is OpenAI's moat? Loads of people outside the company are working on alternative models. They may have a lead right now but will it last a few years? Will it even last 6 months?

replies(4): >>37421647 #>>37421649 #>>37421665 #>>37422380 #

foobiekr ◴[07 Sep 23 17:23 UTC] No.37422380[source]▶

>>37421498 #

Adoption and a mass of human feedback collected which is not available in the gleaned data sets.

Here’s another way to think about it. Why does ISA matter in CPUs? There are minor issues around efficiencies of various kinds, but the real advantage of any mainstream ISA is, in part, the availability of tooling (hence this was a correct and heavy early focus for the RISCV effort) but also a lot of ecosystem things you don’t see: for example, Intel and Arm have truly mammoth test and verification suites that represent thousands++ of man years of investment.

OpenAI almost certainly has a massive invisible accumulated value at this point.

The actual models themselves are the output in the same way that a packaged CPU is the output. How you got there matters almost as much or more.

replies(1): >>37426572 #

AnthonyMouse ◴[07 Sep 23 22:36 UTC] No.37426572[source]▶

>>37422380 #

> Here’s another way to think about it. Why does ISA matter in CPUs?

Honestly the answer is that it mostly doesn't.

An ISA isn't viable without tooling, but that's why it's the first thing they all get. The only ISA with any significant moat is x86, and that's because there is so much legacy closed source software for it that people still need but would have to be emulated on any other architecture. And even that only works as long as x86 processors are competitive; if they fell behind then customers would just eat the emulation overhead on something else.

Other ISAs don't even have that. Would anybody actually be surprised if RISC-V took a huge chunk of out ARM's market share in the not too distant future?

replies(1): >>37429430 #

1. foobiekr ◴[08 Sep 23 04:44 UTC] No.37429430{3}[source]▶

>>37426572 #

That's literally my point. The problem is that there's a massive amount of hidden infrastructure behind those that you don't see and that "oh look everyone has a big model" isn't as impressive as it sounds.

replies(1): >>37430771 #

2. AnthonyMouse ◴[08 Sep 23 07:52 UTC] No.37430771[source]▶

>>37429430 (TP) #

But the open source infrastructure is getting built too. And the infrastructure is mostly independent of the model. This is Falcon 180B running using the code from llama.cpp.

↑