←back to thread

623 points magicalhippo | 2 comments | | HN request time: 0s | source
Show context
magicalhippo ◴[] No.42619182[source]
Not much was unveiled but it showed a Blackwell GPU with 1PFLOP of FP4 compute, 128GB unified DDR5X memory, 20 ARM cores, and ConnectX powering two QSFP slots so one can stack multiple of them.

edit: While the title says "personal", Jensen did say this was aimed at startups and similar, so not your living room necessarily.

replies(1): >>42619341 #
computably ◴[] No.42619341[source]
From the size and pricing ($3000) alone, it's safe to conclude it has less raw FLOPs than a 5090. Since it uses LPDDR5X, almost certainly less memory bandwidth too (5090 @ 1.8 TB/s, M4 Max w/ 128GB LPDDR5X @ 546 GB/s). Basically the only advantage is how much VRAM it packs in a small form factor, and presumably greater power efficiency at its smaller scale.

The only thing it really competes with is the Mac Studio for LocalLlama-type enthusiasts and devs. It isn't cheap enough to dent the used market, nor powerful enough to stand in for bigger cards.

replies(4): >>42619643 #>>42620016 #>>42620598 #>>42622446 #
1. llm_nerd ◴[] No.42622446[source]
The product isn't even finalized. It might never come to fruition, and I cannot fathom how they will make the power profile fit. I am skeptical that a $3000 device with 128GB of RAM and a 4TB SSD with the specs provided will even see reality any time within the next year, but let's pretend it will.

However we do know that it offers 1/4 the TOPS of the new 5090. It will be less powerful than the $600 5070. Which, of course it will given power limitations.

The only real compelling value is that nvidia memory starves their desktop cards so severely. It's the small opening that Apple found, even though Apple's FP4/FP8 performance is a world below what nvidia is offering. So purely from that perspective this is a winning product, as 128GB opens up a lot of possibilities. But from a raw performance perspective, it's actually going to pale compared to other nvidia products.

replies(1): >>42632090 #
2. computably ◴[] No.42632090[source]
AI TOPS numbers for Blackwell/ 5090 are probably for a niche numeric type like INT8 or INT4.

At FP32 (and FP16, assuming the consumer cards are still neutered), the 5090 apparently does ~105-107 TFLOPS, and the full GB202 ~125 TFLOPS. That means a non-neutered GB202-based card could hit ~250 TFLOPS of FP16, which lines up neatly with 1 PFLOP of FP4.

In reality, FP4 is more-than-linearly efficient relative to FP32. They quoted FP4 and not FP8 / FP16 for a reason. I wouldn't be too surprised if it doesn't even support FP32, maybe even FP16. Plus, they likely cut RT cores and other graphics-related features, making for a smaller and therefore more power efficient chip, because they're positioning this as an "AI supercomputer" and this hardware doesn't make sense for most graphical applications.

I see no reason this product wouldn't come to market - besides the usual supply/demand. There's value for a small niche and particular price bracket: enthusiasts running large q4 models, cheaper but slower vs. dedicated cards (3x-10x price/VRAM) and price-competitive but much faster vs. Apple silicon. It's a good strategic move for maintaining Nvidia's hold on the ecosystem regardless of the sales revenue.