←back to thread

577 points simonw | 7 comments | | HN request time: 0.878s | source | bottom
Show context
joelthelion ◴[] No.44724227[source]
Apart from using a Mac, what can you use for inference with reasonable performance? Is a Mac the only realistic option at the moment?
replies(6): >>44724398 #>>44724419 #>>44724553 #>>44724563 #>>44724959 #>>44727049 #
1. whimsicalism ◴[] No.44724959[source]
you are almost certainly better off renting GPUs, but i understand self-hosting is an HN touchstone
replies(2): >>44725021 #>>44725699 #
2. qingcharles ◴[] No.44725021[source]
This. Especially if you just want to try a bunch of different things out. Renting is insanely cheap -- to the point where I don't understand how the renters are making their money back unless they stole the hardware and power.

It can really help you figure a ton of things out before you blow the cash on your own hardware.

replies(1): >>44725157 #
3. 4b11b4 ◴[] No.44725157[source]
Recommended sites to rent from
replies(2): >>44725244 #>>44725337 #
4. doormatt ◴[] No.44725244{3}[source]
runpod.io
5. whimsicalism ◴[] No.44725337{3}[source]
runpod, vast, hyperbolic, prime intellect. if all you're doing is going to be running LLMs, you can pay per token on openrouter or some of the providers listed there
6. mrinterweb ◴[] No.44725699[source]
I don't know about that. I've had my RTX 4090 for nearly 3 years now. If I had a script that provisioned and deprovisioned a rented 4090 at $0.70/hr for an 8 hour work day for 20 work days per month. Assuming I get 2 paid weeks off per year + normal holidays over 3 years.

0.7 * 8 * ((20 * 12) - 8 - 14) * 3 = $3662

I bought my RTX 4090 for about $2200. I also had the pleasure of being able to use it for gaming when I wasn't working. To be fair, the VRAM requirements for local models keeps climbing and my 4090 isn't able to run many of the latest LLMs. Also, I omitted cost of electricity for my local LLM server cost. I have not been measuring total watts consumed by just that machine.

One nice thing about renting is that it give you flexibility in terms of what you want to try.

If you're really looking for the best deals look at 3rd party hosts serving open models for the API-based pricing, or honestly a Claude subscription can easily be worth it if you use LLMs a fair bit.

replies(1): >>44725791 #
7. whimsicalism ◴[] No.44725791[source]
1. I agree - there are absolutely scenarios in which it can make sense to buy a GPU and run it yourself. If you are managing a software firm with multiple employees, you very well might break even in less than a few years. But I would gander this is not the case for 90%+ of people self-hosting these models, unless they have some other good reason (like gaming) to buy a GPU.

2. I basically agree with your caveats - excluding electricity is a pretty big exclusion and I don't think that you've had 3 years of really high-value self-hostable models, I would really only say the last year and I'm somewhat skeptical of how good for ones that can be hosted in 24gb vram. 4x4090 is a different story.