Running a 180B parameter LLM on a single Apple M2 Ultra

(twitter.com)

255 points tbruckner | 1 comments | 07 Sep 23 14:36 UTC | HN request time: 0.215s | source

Show context

superkuh[dead post] ◴[07 Sep 23 15:33 UTC] No.37420475[source]▶

>>37419518 (OP) #

[flagged]

sbierwagen ◴[07 Sep 23 15:34 UTC] No.37420490[source]▶

>>37420475 #

M2 Mac Studio with 192gb of ram is US$5,599 right now.

replies(3): >>37420616 #>>37420693 #>>37427799 #

superkuh[dead post] ◴[07 Sep 23 15:45 UTC] No.37420693[source]▶

>>37420490 #

[flagged]

yumraj ◴[07 Sep 23 15:50 UTC] No.37420789[source]▶

>>37420693 #

It’s not useless.

It seems a Thunderbolt/USB4 external NVME enclosure can do about 2500-3000 MB/s which is about half of internal SSD. So not at all bad. It’ll just add an additional few tens of seconds while loading the model. Totally manageable.

Edit: in fact this is the proper route anyway since it allows you to work with huge model and intermediate FP16/FP32 files while quantizing. Internal storage, regardless of how much, will run out quickly.

replies(1): >>37420889 #

superkuh ◴[07 Sep 23 15:57 UTC] No.37420889[source]▶

>>37420789 #

>Internal storage, regardless of how much, will run out quickly.

This only applies to Macs and Mac-a-likes. Actual desktop PCs have many SATA ports and can store reasonable amounts of data without the crutch of external high latency storage making things iffy. I say this as someone with TBs of llama models on disk and I do quantization myself (sometimes).

BTW my computer cost <$900 w/17TB of storage currently and can run up to 34B 5bit llm. I could spend $250 more to upgrade to 128GB of DDR4 2666 ram and run the 65B/70B but 180B is out of the range. You do have to spend big money for that.

replies(4): >>37421057 #>>37421079 #>>37421096 #>>37422593 #

1. GeekyBear ◴[07 Sep 23 17:36 UTC] No.37422593[source]▶

>>37420889 #

> Actual desktop PCs have many SATA ports

How many of those PCs have 10 Gigabit Ethernet by default? You can set up fast networked storage in any size you like and share it with many computers, not just one.

↑