←back to thread

623 points magicalhippo | 1 comments | | HN request time: 0s | source
Show context
Karupan ◴[] No.42619320[source]
I feel this is bigger than the 5x series GPUs. Given the craze around AI/LLMs, this can also potentially eat into Apple’s slice of the enthusiast AI dev segment once the M4 Max/Ultra Mac minis are released. I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!
replies(21): >>42619339 #>>42619433 #>>42619472 #>>42619544 #>>42619769 #>>42620175 #>>42620289 #>>42620359 #>>42620740 #>>42621569 #>>42621821 #>>42622149 #>>42622154 #>>42622259 #>>42622359 #>>42622567 #>>42622577 #>>42622621 #>>42622863 #>>42627093 #>>42627188 #
doctorpangloss ◴[] No.42619769[source]
What slice?

Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.

I don't think Digits will perform well either.

If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.

replies(2): >>42619816 #>>42619818 #
Karupan ◴[] No.42619818[source]
They are perfectly fine for certain people. I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious and would prefer using existing devices rather than pay for subscriptions where possible.

And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.

replies(1): >>42620177 #
acchow ◴[] No.42620177[source]
> I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious

Qwen 2.5 32B on openrouter is $0.16/million output tokens. At your 16 tokens per second, 1 million tokens is 17 continuous hours of output.

Openrouter will charge you 16 cents for that.

I think you may want to reevaluate which is the real budget choice here

Edit: elaborating, that extra 16GB ram on the Mac to hold the Qwen model costs $400, or equivalently 1770 days of continuous output. All assuming electricity is free

replies(4): >>42620372 #>>42621086 #>>42621314 #>>42621716 #
1. Karupan ◴[] No.42620372[source]
It's a no brainer for me cause I already own the MacBook and I don't mind waiting a few extra seconds. Also, I didn't buy the mac for this purpose, it's just my daily device. So yes, I'm sure OpenRouter is cheaper, but I just don't have to think about using it as long as the open models are reasonable good for my use. Of course your needs may be quite different.