Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 5 comments | 07 Sep 23 14:36 UTC | HN request time: 1.05s | source

Show context

adam_arthur ◴[07 Sep 23 15:32 UTC] No.37420461[source]▶

>>37419518 (OP) #

Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.

Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.

It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.

Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

replies(8): >>37420537 #>>37420948 #>>37421196 #>>37421214 #>>37421497 #>>37421862 #>>37421945 #>>37424918 #

cs702 ◴[07 Sep 23 16:30 UTC] No.37421497[source]▶

>>37420461 #

I agree: No one has any technological advantage when it comes to LLMs anymore. Some companies, like OpenAI, may have other advantages, like an ecosystem of developers. But most of the gobs of money that so many companies have burned to train giant proprietary models is unlikely to see any payback.

What I think will happen is that more companies will come to the realization it's in their best interest to open their giant models. The cost of training all those giant models is already a sunk cost. If there's no profit to be made by keeping a model proprietary, why not open it to gain or avoid losing mind-share, and to mess with competitors' plans?

First, it was LLaMA, with up to 65B params, opened against Meta's wishes. Then, it was LLaMA 2, with up to 70B params, opened by Meta on purpose, to mess with Google's and Microsoft/OpenAI's plans. Now, it's Falcon 180B. Like you, I'm wondering, what comes next?

replies(4): >>37421627 #>>37422256 #>>37424763 #>>37429907 #

1. foobiekr ◴[07 Sep 23 17:16 UTC] No.37422256[source]▶

>>37421497 #

The cost isn’t sunk cost at all. These models need to be trained and retrained as data sets increase. Putting aside historical cutoff points, there’s a lot of data and kinds of data not currenty used and the costs even to train the current models is incredible.

I think you guys are missing a massive technical consideration which is cost. Training cost, offering cost. As with everything else in tech, outside of the bubble created by ZIRP over the last decade and a half (and the entire two generations of tech workers who never learned this important lesson thus far in their careers), costs matter and are a primary driver of technology success.

If you attached dollar costs to these models above, if the data was available, you’d quickly discover who (if anyone) has a sustainable business model and who doesn’t.

A sustainable model is what determines long term whether w technology is available and whether that leads to further improvement (and increasing sustainability/financial value).

replies(1): >>37422898 #

2. adam_arthur ◴[07 Sep 23 17:55 UTC] No.37422898[source]▶

>>37422256 (TP) #

GPT-4 cost on the order of $100 million, per Sam Altman.

This is orders of magnitude lower than many companies and government R&D budgets. It's easily financeable by 1000s of independently wealthy people and organizations. It's easily financeable by VC money. This is far cheaper than many other startups or product initiatives that have been tried. There are very likely to be many organizations that build models for the specific purpose of open sourcing the resulting model... the Falcon and Llama models are already proof enough of this

Costs to train equivalent models may increase in the short term due to race towards GPU consumption raising costs... but compute will get cheaper in aggregate over time due to improving compute tech.

And once the model is built it is largely a sunk cost, yes. All that needs to happen is for a single SoTA model to be made open to completely negate any advantage a competitor has. Monetization from LLMs will be driven by focused application of the models, not from providing an interface to a general model. High quality data holds more value than the resulting model

Not every query requires timeliness of data. Incorporating new data into an existing model is likely to be cheaper than retraining the model from scratch, but just speculation on my end.

replies(1): >>37429443 #

3. foobiekr ◴[08 Sep 23 04:46 UTC] No.37429443[source]▶

>>37422898 #

I think you are overestimating R&D budgets for companies. Very few tech companies - even large ones - have R&D budgets in the $10B+ range, let alone $100B. Most of the fortune 100 isn't even $10B.

replies(1): >>37430039 #

4. danielbln ◴[08 Sep 23 06:12 UTC] No.37430039{3}[source]▶

>>37429443 #

Where do you get $100B from?

replies(1): >>37435993 #

5. foobiekr ◴[08 Sep 23 16:43 UTC] No.37435993{4}[source]▶

>>37430039 #

"orders of magnitude"

↑