Depends on what you're doing. Just chatting with the AI?
I'm getting about 7 tokens per sec for Mistral with the Q6_K on a bog standard Intel i5-11400 desktop with 32G of memory and no discrete GPU (the CPU has Intel UHD Graphics 730 built in). 2 year old low end CPU that goes for, what $150? these days. As far as I'm concerned that's conversational speed. Pop in some 8 core modern CPU and I'm betting you can double that, without even involving any GPU.
People way overestimate what they need in order to play around with models these days. Use llama.cpp and buy that extra $80 worth of RAM and pay about half the price of a comparable Mac all in. Bigger models? Buy more RAM, which is very cheap these days.
There's a $487 special on Newegg today with an i7-12700KF, motherboard and 32G of ram. Add another $300 worth of case, power supply, SSD and more RAM and you're under the price of a Macbook Air. There's your LLM inference machine (not for training obviously) which can run even the 70B models at home at acceptable conversational speed.