I mean, this is awfully close to being "Her" in a box, right?
I wonder how it would go as a productivity/tinkering/gaming rig? Could a GPU potentially be stacked in the same way an additional Digit can?
Surely a smaller market than gamers or datacenters for sure.
Also, I don't particularly want my data to be processed by anyone else.
It’s purely an ecosystem play imho. It benefits the kind of people who will go on to make potentially cool things and will stay loyal.
Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.
I don't think Digits will perform well either.
If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.
They are good for single batch inference and have very good tok/sec/user. ollama works perfectly in mac.
And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.
100%
The people who prototype on a 3k workstation will also be the people who decide how to architect for a 3k GPU buildout for model training.
Plus, YouTube and the Google images is already full of AI generated slop and people are already tired of it. "AI fatigue" amongst majority of general consumers is a documented thing. Gaming fatigues is not.
i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.
The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.
Qwen 2.5 32B on openrouter is $0.16/million output tokens. At your 16 tokens per second, 1 million tokens is 17 continuous hours of output.
Openrouter will charge you 16 cents for that.
I think you may want to reevaluate which is the real budget choice here
Edit: elaborating, that extra 16GB ram on the Mac to hold the Qwen model costs $400, or equivalently 1770 days of continuous output. All assuming electricity is free
On the other hand, with a $5000 macbook pro, I can easily load a 70b model and have a "full" macbook pro as a plus. I am not sure I fully understand the value of these cards for someone that want to run personal AI models.
Plus you have fast interconnects, if you want to stack them.
I was somewhat attracted by the Jetson AGX Orin with 64 GB RAM, but this one is a no-brainer for me, as long as idle power is reasonable.
I have a bit of an interest in games too.
If I could get one platform for both, I could justify 2k maybe a bit more.
I can't justify that for just one half: running games on Mac, right now via Linux: no thanks.
And on the PC side, nvidia consumer cards only go to 24gb which is a bit limiting for LLMs, while being very expensive - I only play games every few months.
Do I buy a Macbook with silly amount of RAM when I only want to mess with images occasionally.
Do I get a big Nvidia card, topping out at 24gb - still small for some LLMs, but I could occasionally play games using it at least.
Incredible fumble for me personally as an investor
And log everything too?
No. There's already too much porn on the internet, and AI porn is cringe and will get old very fast.
And if you truly did predict that Nvidia would own those markets and those markets would be massive, you could have also bought Amazon, Google or heck even Bitcoin. Anything you touched in tech really would have made you a millionaire really.
It will be massive for research labs. Most academics have to jump through a lot of hoops to get to play with not just CUDA, but also GPUDirect/RDMA/Infiniband etc. If you get older/donated hardware, you may have a large cluster but not newer features.
No, they can’t. GPU databases are niche products with severe limitations.
GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.
Mac Pro [0] is a desktop with M2 Ultra and up to 192GB of unified memory.
Those Macs with unified memory is a threat he is immediately addressing. Jensen is a wartime ceo from the looks of it, he’s not joking.
No wonder AMD is staying out of the high end space, since NVIDIA is going head on with Apple (and AMD is not in the business of competing with Apple).
How so?
Only 40% of gamers use a PC, a portion of those use AI in any meaningful way, and a fraction of those want to set up a local AI instance.
Then someone releases an uncensored, cloud based AI and takes your market?
today. For the reasons like i mentioned.
>GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.
GPU are fast at massively parallel tasks. Their memory bandwidth is 10x of that of the CPU for example. So, typical database operations, massively parallel in nature like join or filter, would run about that faster.
Majority of computing can be parallelized and thus benefit from being executed on GPU (with unified memory of the practically usable for enterprise sizes like 128GB).
This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!
Titanic - so about to hit an iceberg and sink?
No one goes to an Apple store thinking "I'll get a laptop to do AI inference".
About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.
Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.
Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.
Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).
The cutting edge will advance, and convincing bespoke porn of people's crushes/coworkers/bosses/enemies/toddlers will become a thing. With all the mayhem that results.
Performance is not amazing (roughly 4060 level, I think?) but in many ways it was the only game in town unless you were willing and able to build a multi-3090/4090 rig.
Suppose you're a content creator and you need an image of a real person or something copyrighted like a lot of sports logos for your latest YouTube video's thumbnail. That kind of thing.
I'm not getting into how good or bad that is; I'm just saying I think it's a pretty common use case.
(example: a thumbnail for a YT video about a video game, featuring AI-generated art based on that game. because copyright reasons, in my very limited experience Dall-E won't let you do that)
I agree that AI porn doesn't seem a real market driver. With 8 billion people on Earth I know it has its fans I guess, but people barely pay for porn in the first place so I reallllly dunno how many people are paying for AI porn either directly or indirectly.
It's unclear to me if AI generated video will ever really cross the "uncanny valley." Of course, people betting against AI have lost those bets again and again but I don't know.
Maybe (LP)CAMM2 memory will make model usage just cheap enough that I can have a hosting server for it and do my usual midrange gaming GPU thing before then.
They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.
Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star
The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.
[...]
AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.
Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.
I’m so tired of this recent obsession with the stock market. Now that retail is deeply invested it is tainting everything, like here on a technology forum. I don’t remember people mentioning Apple stock every time Steve Jobs made an announcement in the past decades. Nowadays it seems everyone is invested in Nvidia and just want the stock to go up, and every product announcement is a mean to that end. I really hope we get a crash so that we can get back to a more sane relation with companies and their products.
"The global gaming market size was valued at approximately USD 221.24 billion in 2024. It is forecasted to reach USD 424.23 billion by 2033, growing at a CAGR of around 6.50% during the forecast period (2025-2033)"
I needed an uncensored model in order to, guess what, make an AI draw my niece snowboarding down a waterfall. All the online services refuse on basis that the picture contains -- oh horrors -- a child.
"Uncensored" absolutely does not imply NSFW.
Much of the growth in gaming of late has come from exploitive dark patterns, and those dark patterns eventually stop working because users become immune to them.
If what you say it's true you were among the first 100 people on the planet who were doing this; which btw, further supports my argument on how extremely rare is that use case for Mac users.
Strix Halo is a replacement for the high-power laptop CPUs from the HX series of Intel and AMD, together with a discrete GPU.
The thermal design power of a laptop CPU-dGPU combo is normally much higher than 120 W, which is the maximum TDP recommended for Strix Halo. The faster laptop dGPUs want more than 120 W only for themselves, not counting the CPU.
So any claims of being surprised that the TDP range for Strix Halo is 45 W to 120 W are weird, like the commenter has never seen a gaming laptop or a mobile workstation laptop.
Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.
Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.
Not just in the drag the feet communication. Getting the tech people a line problem.
Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.
Amd/Intel work directly with Microsoft for shipping new silicon that would otherwise require it.
0. https://www.macstadium.com/blog/m4-mac-mini-review
1. https://www.apple.com/mac/compare/?modelList=Mac-mini-M4,Mac...
They did not collapse, they moved to smartphones. The "free"-to-play gacha portion of the gaming market is so successful it is most of the market. "Live service" games are literally traditional game makers trying to grab a tiny slice of that market, because it's infinitely more profitable than making actual games.
>those dark patterns eventually stop working because users become immune to them.
Really? Slot machines have been around for generations and have not become any less effective. Gambling of all forms has relied on the exact same physiological response for millennia. None of this is going away without legislation.
Slot machines are not a growth market. The majority of people wised to them literal generations ago, although enough people remain susceptible to maintain a handful of city economies.
> They did not collapse, they moved to smartphones
Agreed, but the dark patterns being used are different. The previous dark patterns became ineffective. The level of sophistication of psychological trickery in modern f2p games is far beyond anything Farmville ever attempted.
The rise of live service games also does not bode well for infinite growth in the industry as there's only so many hours to go around each day for playing games and even the evilest of player manipulation techniques can only squeeze so much blood from a stone.
The industry is already seeing the failure of new live service games to launch, possibly analogous to what happened in the MMO market when there was a rush of releases after WoW. With the exception of addicts, most people can only spend so many hours a day playing games.
A real shame it's not running mainline Linux - I don't like their distro based on Ubuntu LTS.
I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.
I can't find the exact Youtube video, but it's out there.
Normally? Much higher than 120W? Those are some pretty abnormal (and dare I say niche?) laptops you're talking about there. Remember, that's not peak power - thermal design power is what the laptop should be able to power and cool pretty much continuously.
At those power levels, they're usually called DTR: desktop replacement. You certainly can't call it "just a laptop" anymore once we're in needs-two-power-supplies territory.
https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
"The GB10 Superchip enables Project DIGITS to deliver powerful performance using only a standard electrical outlet. Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage. With the supercomputer, developers can run up to 200-billion-parameter large language models to supercharge AI innovation."
https://www.nvidia.com/en-us/data-center/grace-cpu-superchip...
"Grace is the first data center CPU to utilize server-class high-speed LPDDR5X memory with a wide memory subsystem that delivers up to 500GB/s of bandwidth "
As far as i see it is about 4x of Zen 5.
We still schedule "bi-weekly" meetings.
We can't agree on which way charge goes in a wire.
Have you seen the y-axis on an economists chart?
I do not know which is the proportion of gaming laptops and mobile workstations vs. thin and light laptops. While obviously there must be much more light laptops, the gaming laptops cannot be a niche product, because there are too many models offered by a lot of vendors.
My own laptop is a Dell Precision, so it belongs to this class. I would not call Dell Precision laptops as a niche product, even if they are typically used only by professionals.
My previous laptop was some Lenovo Yoga that also belonged to this class, having a discrete NVIDIA GPU. In general, any laptop having a discrete GPU belongs to this class, because the laptop CPUs intended to be paired with discrete GPUs have a default TDP of 45 W or 55 W, while the smallest laptop discrete GPUs may have TDPs of 55 W to 75 W, but the faster laptop GPUs have TDPs between 100 W and 150 W, so the combo with CPU reaches a TDP around 200 W for the biggest laptops.
Given workload A how much of the total runtime JOIN or FILTER would take in contrast to the storage engine layer for example? My gut feeling tells me not much since to see the actual gain you'd need to be able to parallelize everything including the storage engine challenges.
IIRC all the startups building databases around GPUs failed to deliver in the last ~10 years. All of them are shut down if I am not mistaken.
This hardware is only good for current-generation "AI".
If they is true their path to profitability isn't super rocky. Their path to achieving their current valuation may end up being trickier though!
How about attaching SSD based storage to NVLink? :) Nvidia does have the direct to memory tech and uses wide buses, so i don't see any issue for them to direct attach arrays of SSD if they feel like it.
>IIRC all the startups building databases around GPUs failed to deliver in the last ~10 years. All of them are shut down if I am not mistaken.
As i already said - model of database offloading some ops to GPU with its separate memory isn't feasible, and those startups confirmed it. Especially when GPU would be 8-16GB while the main RAM can easily be 1-2TB with 100-200 CPU cores. With 128GB unified memory like on GB10 the situation looks completely different (that Nvidia allows only 2 to be connected by NVLink is just a market segmentation not a real technical limitation).
True passion for one's career is rare, despite the clichéd platitudes ecouraging otherwise. That's something we should encourage and invest in regardless of the field.
Not sure it'd competitive in price with other workstation class machines. I don't know how expensive IBM's S1012 desk side is, but with only 64 threads, it'd be a meh workstation.
In other words, and hypothetically, if you can improve logical plan execution to run 2x faster by rewriting the algorithms to make use of GPU resources but physical plan execution remains to be bottlenecked by the storage engine, then the total sum of gains is negligible.
But I guess there could perhaps be some use-case where this could be proved as a win.
I have to agree the desktop experience of the Mac is great, on par with the best Linuxes out there.
The one thing I wonder is noise. That box is awfully small for the amount of compute it packs, and high-end Mac Studios are 50% heatsink. There isn’t much space in this box for a silent fan.
Quote:
"It also supports up to 128GB of unified memory, so developers can easily interact with LLMs that have nearly 200 billion parameters."