Where a Mac may beat the above is on the memory side, if a model requires more than 24/32 GB of GPU memory you are usually better off with a Mac with 64/128 GB of RAM. On a Mac the memory is shared between CPU and GPU, so the GPU can load larger models.
1. 2-4x 3090+ nvidia cards. Some are getting Chinese 48GB cards. There is a ceiling to vRAM that prevents the biggest models from being able to load, most can run most quants at great speeds
2. Epyc servers running CPU inference with lots of RAM at as high of memory bandwidth as is available. With these setups people are getting like 5-10 t/s but are able to run 450B parameter models.
3. High RAM Macs with as much memory bandwidth as possible. They are the best balanced approach and surprisingly reasonable relative to other options.
It can really help you figure a ton of things out before you blow the cash on your own hardware.
0.7 * 8 * ((20 * 12) - 8 - 14) * 3 = $3662
I bought my RTX 4090 for about $2200. I also had the pleasure of being able to use it for gaming when I wasn't working. To be fair, the VRAM requirements for local models keeps climbing and my 4090 isn't able to run many of the latest LLMs. Also, I omitted cost of electricity for my local LLM server cost. I have not been measuring total watts consumed by just that machine.
One nice thing about renting is that it give you flexibility in terms of what you want to try.
If you're really looking for the best deals look at 3rd party hosts serving open models for the API-based pricing, or honestly a Claude subscription can easily be worth it if you use LLMs a fair bit.
2. I basically agree with your caveats - excluding electricity is a pretty big exclusion and I don't think that you've had 3 years of really high-value self-hostable models, I would really only say the last year and I'm somewhat skeptical of how good for ones that can be hosted in 24gb vram. 4x4090 is a different story.
FWIW GPU aside, my PC isn't particularly new - it is a 5-6 year old PC that was the cheapest money could buy originally and became "decent" at the time i upgraded it ~5 years ago and i only added the GPU around Christmas as prices were dropping since AMD was about to release the new GPUs.
[0] https://i.imgur.com/FevOm0o.png
[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...
It's not as good as unified RAM, but it's also workable.