Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 2 comments | 07 Sep 23 14:36 UTC | HN request time: 0s | source

Show context

adam_arthur ◴[07 Sep 23 15:32 UTC] No.37420461[source]▶

>>37419518 (OP) #

Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.

Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.

It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.

Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

replies(8): >>37420537 #>>37420948 #>>37421196 #>>37421214 #>>37421497 #>>37421862 #>>37421945 #>>37424918 #

1. visarga ◴[07 Sep 23 16:52 UTC] No.37421862[source]▶

>>37420461 #

> I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers.

LLMs make possible the great skill sharing, they are learning from some people through web and books, and then assist other people in their particular problems. This level of sharing and customisation is even greater and more accessible than open source.

replies(1): >>37423105 #

2. passion__desire ◴[07 Sep 23 18:07 UTC] No.37423105[source]▶

>>37421862 (TP) #

All the great points Salman Khan made about Khan Academy in his famous ted talk apply here. The only difference is LLMs can go from Eli5 to EliPhD in just few back and forth. Then to put cherry on the top, you can ask it summarize the conversation in a poem written in style of Walt Whitman.

↑