←back to thread

623 points magicalhippo | 1 comments | | HN request time: 0s | source
Show context
Karupan ◴[] No.42619320[source]
I feel this is bigger than the 5x series GPUs. Given the craze around AI/LLMs, this can also potentially eat into Apple’s slice of the enthusiast AI dev segment once the M4 Max/Ultra Mac minis are released. I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!
replies(21): >>42619339 #>>42619433 #>>42619472 #>>42619544 #>>42619769 #>>42620175 #>>42620289 #>>42620359 #>>42620740 #>>42621569 #>>42621821 #>>42622149 #>>42622154 #>>42622259 #>>42622359 #>>42622567 #>>42622577 #>>42622621 #>>42622863 #>>42627093 #>>42627188 #
doctorpangloss ◴[] No.42619769[source]
What slice?

Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.

I don't think Digits will perform well either.

If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.

replies(2): >>42619816 #>>42619818 #
Karupan ◴[] No.42619818[source]
They are perfectly fine for certain people. I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious and would prefer using existing devices rather than pay for subscriptions where possible.

And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.

replies(1): >>42620177 #
acchow ◴[] No.42620177[source]
> I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious

Qwen 2.5 32B on openrouter is $0.16/million output tokens. At your 16 tokens per second, 1 million tokens is 17 continuous hours of output.

Openrouter will charge you 16 cents for that.

I think you may want to reevaluate which is the real budget choice here

Edit: elaborating, that extra 16GB ram on the Mac to hold the Qwen model costs $400, or equivalently 1770 days of continuous output. All assuming electricity is free

replies(4): >>42620372 #>>42621086 #>>42621314 #>>42621716 #
1. moffkalast ◴[] No.42621314[source]
It's a great option if you want to leak your entire internal codebase to 3rd parties.