←back to thread

577 points simonw | 1 comments | | HN request time: 0.207s | source
Show context
joelthelion ◴[] No.44724227[source]
Apart from using a Mac, what can you use for inference with reasonable performance? Is a Mac the only realistic option at the moment?
replies(6): >>44724398 #>>44724419 #>>44724553 #>>44724563 #>>44724959 #>>44727049 #
1. reilly3000 ◴[] No.44724419[source]
The top 3 approaches I see a lot on r/localllama are:

1. 2-4x 3090+ nvidia cards. Some are getting Chinese 48GB cards. There is a ceiling to vRAM that prevents the biggest models from being able to load, most can run most quants at great speeds

2. Epyc servers running CPU inference with lots of RAM at as high of memory bandwidth as is available. With these setups people are getting like 5-10 t/s but are able to run 450B parameter models.

3. High RAM Macs with as much memory bandwidth as possible. They are the best balanced approach and surprisingly reasonable relative to other options.