←back to thread

255 points tbruckner | 1 comments | | HN request time: 0.193s | source
Show context
superkuh[dead post] ◴[] No.37420475[source]
[flagged]
YetAnotherNick ◴[] No.37420966[source]
You just need 4 3090($4000) to run it. And 4 3090 are definitely lot more powerful and versatile than an M2 mac.
replies(3): >>37421025 #>>37421444 #>>37424271 #
mk_stjames ◴[] No.37421444[source]
The data buffer size shown by Georgi here is 96GB, plus there is the other overhead; it states the recommended max working set size for this context is 147GB, so no Flacon 180B in Q4 as shown wouldn't fit on 4x 24GB 3090's (96GB VRAM).

But I'm also in the quad-3090 build idea stage as well and bought 2 with the intention to go up to 4 eventually for this purpose. However, since I bought my first 2 a few months back (at about 800 euro each!) the ebay prices have actually gone up... a lot; I purchased a specific model that I thought would be plentiful as I had found a seller with a lot of them from OEM pulls, and they were taking good offers- and suddenly they all got sold. I feel like we are entering another GPU gap like 2020-2021.

Based on the performance of Llama2 70B, I think 96GB of vram and the cuda core count x bandwidth of 4 3090's will hit a golden zone as far as price-performance of a deep learning rig that can do a bit of finetuning on top of just inference.

Unless A6000 prices (or A100 prices) start plummeting.

My only hold out is the thought that maybe nvidia releases a 48gb Titan-type card at a less-than-A6000 price sometime soon, which will shake things up.

replies(2): >>37421913 #>>37452138 #
1. rogerdox14 ◴[] No.37452138[source]
"recommended max working set size" is a property of the Mac the model is being run on, not the model itself. The model is smaller than that, otherwise it wouldn't be running on GPU.