Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.
I don't think Digits will perform well either.
If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.
And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.
Qwen 2.5 32B on openrouter is $0.16/million output tokens. At your 16 tokens per second, 1 million tokens is 17 continuous hours of output.
Openrouter will charge you 16 cents for that.
I think you may want to reevaluate which is the real budget choice here
Edit: elaborating, that extra 16GB ram on the Mac to hold the Qwen model costs $400, or equivalently 1770 days of continuous output. All assuming electricity is free
And log everything too?