Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 1 comments | 07 Sep 23 14:36 UTC | HN request time: 0.301s | source

Show context

adam_arthur ◴[07 Sep 23 15:32 UTC] No.37420461[source]▶

>>37419518 (OP) #

Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.

Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.

It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.

Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

replies(8): >>37420537 #>>37420948 #>>37421196 #>>37421214 #>>37421497 #>>37421862 #>>37421945 #>>37424918 #

cs702 ◴[07 Sep 23 16:30 UTC] No.37421497[source]▶

>>37420461 #

I agree: No one has any technological advantage when it comes to LLMs anymore. Some companies, like OpenAI, may have other advantages, like an ecosystem of developers. But most of the gobs of money that so many companies have burned to train giant proprietary models is unlikely to see any payback.

What I think will happen is that more companies will come to the realization it's in their best interest to open their giant models. The cost of training all those giant models is already a sunk cost. If there's no profit to be made by keeping a model proprietary, why not open it to gain or avoid losing mind-share, and to mess with competitors' plans?

First, it was LLaMA, with up to 65B params, opened against Meta's wishes. Then, it was LLaMA 2, with up to 70B params, opened by Meta on purpose, to mess with Google's and Microsoft/OpenAI's plans. Now, it's Falcon 180B. Like you, I'm wondering, what comes next?

replies(4): >>37421627 #>>37422256 #>>37424763 #>>37429907 #

bugglebeetle ◴[07 Sep 23 16:38 UTC] No.37421627[source]▶

>>37421497 #

I think it’s the opposite. Models will become more commoditized and closed/invisible as the basis of other service offerings. Apple isn’t going to start offering general API access to the model they’re training, but will bake it into a bunch of stuff and maybe give platform developers limited access. Meta will probably continue to drive the commoditization train because they have a killer ML/AI team, but the same thing will likely happen there once it’s the basis for a service that generates money.

replies(2): >>37422273 #>>37422892 #

foobiekr ◴[07 Sep 23 17:17 UTC] No.37422273[source]▶

>>37421627 #

This. We haven’t even entered the get-serious monetization era.

Now that the infinite free money pump has been turned down a bunch, we’re going to see what reality looks like.

replies(1): >>37430155 #

6510 ◴[08 Sep 23 06:28 UTC] No.37430155[source]▶

>>37422273 #

Okay ill tell you. You need to start a startup that sets up a good number of cameras at manual labor jobs. Most of the footage will be completely useless but every day you hit a once in a day event, every week you get a once in a week event, every month, every year, every decade etc! Then the guy working there for 40 years wacks pipe 224 with a hammer 50 cm from the outlet and production resumes.

The footage can be aggressively pruned to fit on the disk.

When the robot is delivered in 2033 it can easily figure out, from the footage, all these weird and rare edge cases.

The difference will be like that between a competent but new employee and someone with 10 years of experience.

I can see the Tesla bots disassembling the production line already. Or do you think it wont happen?

replies(2): >>37454779 #>>37495611 #

1. checkyoursudo ◴[13 Sep 23 12:05 UTC] No.37495611[source]▶

>>37430155 #

Assuming that this would work, which I am fine with granting for purposes of discussion, how does this method ever let you build anything new? Or make use of advances in production methods? Or completely reconfigure a production line because of some regulatory requirement?

↑