Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 1 comments | 07 Sep 23 14:36 UTC | HN request time: 0s | source

Show context

superkuh[dead post] ◴[07 Sep 23 15:33 UTC] No.37420475[source]▶

>>37419518 (OP) #

[flagged]

sbierwagen ◴[07 Sep 23 15:34 UTC] No.37420490[source]▶

>>37420475 #

M2 Mac Studio with 192gb of ram is US$5,599 right now.

replies(3): >>37420616 #>>37420693 #>>37427799 #

superkuh[dead post] ◴[07 Sep 23 15:45 UTC] No.37420693[source]▶

>>37420490 #

[flagged]

yumraj ◴[07 Sep 23 15:50 UTC] No.37420789[source]▶

>>37420693 #

It’s not useless.

It seems a Thunderbolt/USB4 external NVME enclosure can do about 2500-3000 MB/s which is about half of internal SSD. So not at all bad. It’ll just add an additional few tens of seconds while loading the model. Totally manageable.

Edit: in fact this is the proper route anyway since it allows you to work with huge model and intermediate FP16/FP32 files while quantizing. Internal storage, regardless of how much, will run out quickly.

replies(1): >>37420889 #

superkuh ◴[07 Sep 23 15:57 UTC] No.37420889[source]▶

>>37420789 #

>Internal storage, regardless of how much, will run out quickly.

This only applies to Macs and Mac-a-likes. Actual desktop PCs have many SATA ports and can store reasonable amounts of data without the crutch of external high latency storage making things iffy. I say this as someone with TBs of llama models on disk and I do quantization myself (sometimes).

BTW my computer cost <$900 w/17TB of storage currently and can run up to 34B 5bit llm. I could spend $250 more to upgrade to 128GB of DDR4 2666 ram and run the 65B/70B but 180B is out of the range. You do have to spend big money for that.

replies(4): >>37421057 #>>37421079 #>>37421096 #>>37422593 #

1. andromeduck ◴[07 Sep 23 16:06 UTC] No.37421057[source]▶

>>37420889 #

Who TF is still using SATA with SSDs?!

↑