Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 2 comments | 07 Sep 23 14:36 UTC | HN request time: 0.453s | source

Show context

adam_arthur ◴[07 Sep 23 15:32 UTC] No.37420461[source]▶

>>37419518 (OP) #

Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.

Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.

It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.

Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

replies(8): >>37420537 #>>37420948 #>>37421196 #>>37421214 #>>37421497 #>>37421862 #>>37421945 #>>37424918 #

ls612 ◴[07 Sep 23 15:59 UTC] No.37420948[source]▶

>>37420461 #

For me the test is; when will a Siri-LLM be able to run locally on my iPhone at at least GPT-4 levels? 2030? Farther out? Never because of governments forbidding it? To what extent will improvements be driven by the last gasps of Moore’s Law vs by improving model architectures to be more efficient?

replies(3): >>37420983 #>>37421670 #>>37422133 #

adam_arthur ◴[07 Sep 23 16:02 UTC] No.37420983[source]▶

>>37420948 #

Given that phones are a few years behind PCs on RAM, likely whenever the average PC can do it, plus a few years. There are phones out there with 24GB of RAM already, it looks like.

Of course battery life would be a concern there, so I think LLM usage on phones will remain in the cloud.

Haven't studied phone RAM capacity growth rates in detail though

replies(2): >>37421363 #>>37425019 #

baq ◴[07 Sep 23 20:21 UTC] No.37425019[source]▶

>>37420983 #

Wonder if someone is thinking of LLM specific RAM, slower but much denser. Bonus points for not having to reload the model after power cycling.

Maybe call this fantastic technology something idiotic like 3d XPoint?

replies(2): >>37425474 #>>37426620 #

AnthonyMouse ◴[07 Sep 23 22:41 UTC] No.37426620[source]▶

>>37425019 #

> slower but much denser. Bonus points for not having to reload the model after power cycling.

This is called a solid state drive.

replies(1): >>37430833 #

baq ◴[08 Sep 23 08:00 UTC] No.37430833[source]▶

>>37426620 #

Goes to show how badly Intel executed that one.

replies(1): >>37430876 #

AnthonyMouse ◴[08 Sep 23 08:05 UTC] No.37430876[source]▶

>>37430833 #

What? You can do this right now. Put your >100GB model on your SSD in your computer with <100GB of RAM and use mmap. It's not fast, but it runs.

replies(1): >>37432010 #

1. baq ◴[08 Sep 23 10:54 UTC] No.37432010[source]▶

>>37430876 #

My point is Intel had the perfect tech for this and killed it.

https://en.wikipedia.org/wiki/3D_XPoint

replies(1): >>37435180 #

2. AnthonyMouse ◴[08 Sep 23 15:47 UTC] No.37435180[source]▶

>>37432010 (TP) #

They didn't really. What this wants is gobs of memory bandwidth. The fastest NVMe SSDs can essentially saturate the PCIe bus. Using a dozen or more of them in parallel might even have reasonable performance for this. (Most desktops don't have this many PCIe lanes but HEDT and servers do). And they're a lot cheaper than Optane was.

To do better than that would have required the version of Optane that used DIMM slots, which was something like a quarter of the performance of actual DRAM for half the price.

So you had something that costs more than ordinary SSDs if your priority is cost and is slower than DRAM if your priority is performance. A lot of times a middle ground like that is still valuable, but since cache hierarchies are a thing, having a bit of fast DRAM and a lot of cheap SSD serves that part of the market well too.

And in the meantime ordinary SSDs got faster and cheaper and DRAM got faster and cheaper. Now you can get older systems with previous generation DRAM that are faster than Optane for less money. They stopped making it because people stopped buying it.

↑