Most active commenters

JohnFen(5)
jsheard(4)
heavyset_go(4)
jcgrillo(4)
ezst(3)

Popular/hot comments

>>41863905 #
>>41864412 #
>>41864134 #
>>41864304 #
>>41864488 #
>>41865651 #

←back to thread

AI PCs Aren't Good at AI: The CPU Beats the NPU

(github.com)

1. jsheard ◴[16 Oct 24 20:16 UTC] No.41863390[source]▶

>>41863061 (OP) #

These NPUs are tying up a substantial amount of silicon area so it would be a real shame if they end up not being used for much. I can't find a die analysis of the Snapdragon X which isolates the NPU specifically but AMDs equivalent with the same ~50 TOPS performance target can be seen here, and takes up about as much area as three high performance CPU cores:

https://www.techpowerup.com/325035/amd-strix-point-silicon-p...

replies(4): >>41863880 #>>41863905 #>>41864412 #>>41865466 #

2. Kon-Peki ◴[16 Oct 24 21:10 UTC] No.41863880[source]▶

>>41863390 (TP) #

Modern chips have to dedicate a certain percentage of the die to dark silicon [1] (or else they melt/throttle to uselessness), and these kinds of components count towards that amount. So the point of these components is to be used, but not to be used too much.

Instead of an NPU, they could have used those transistors and die space for any number of things. But they wouldn't have put additional high performance CPU cores there - that would increase the power density too much and cause thermal issues that can only be solved with permanent throttling.

[1] https://en.wikipedia.org/wiki/Dark_silicon

replies(2): >>41864171 #>>41865813 #

3. ezst ◴[16 Oct 24 21:13 UTC] No.41863905[source]▶

>>41863390 (TP) #

I can't wait for the LLM fad to be over so we get some sanity (and efficiency) back. I personally have no use for this extra hardware ("GenAI" doesn't help me in any way nor supports any work-related tasks). Worse, most people have no use for that (and recent surveys even show predominant hostility towards AI creep). We shouldn't be paying extra for that, it should be opt-in, and then it would become clear (by looking at the sales and how few are willing to pay a premium for "AI") how overblown and unnecessary this is.

replies(6): >>41863966 #>>41864134 #>>41865168 #>>41865589 #>>41865651 #>>41875051 #

4. DrillShopper ◴[16 Oct 24 21:21 UTC] No.41863966[source]▶

>>41863905 #

Corporatized gains in the market from hype Socialized losses in increased carbon emissions, upheaval from job loss, and higher prices on hardware.

The more they say the future will be better the more that it looks like the status quo.

5. renewiltord ◴[16 Oct 24 21:42 UTC] No.41864134[source]▶

>>41863905 #

I was telling someone this and they gave me link to a laptop with higher battery life and better performance than my own, but I kept explaining to them that the feature I cared most about was die size. They couldn't understand it so I just had to leave them alone. Non-technical people don't get it. Die size is what I care about. It's a critical feature and so many mainstream companies are missing out on my money because they won't optimize die size. Disgusting.

replies(5): >>41864304 #>>41864691 #>>41864921 #>>41866254 #>>41866907 #

6. IshKebab ◴[16 Oct 24 21:47 UTC] No.41864171[source]▶

>>41863880 #

If they aren't being used it would be better to dedicate the space to more SRAM.

replies(2): >>41864316 #>>41870432 #

7. nl ◴[16 Oct 24 22:04 UTC] No.41864304{3}[source]▶

>>41864134 #

Is this a parody?

Why would anyone care about die size? And if you do why not get one of the many low power laptops with Atoms etc that do have small die size?

replies(3): >>41864414 #>>41864462 #>>41864496 #

8. a2l3aQ ◴[16 Oct 24 22:05 UTC] No.41864316{3}[source]▶

>>41864171 #

The point is parts of the CPU have to be off or throttled down when other components are under load to maintain TDP, adding cache that would almost certainly be being used defeats the point of that.

replies(2): >>41864415 #>>41866947 #

9. JohnFen ◴[16 Oct 24 22:19 UTC] No.41864412[source]▶

>>41863390 (TP) #

> These NPUs are tying up a substantial amount of silicon area so it would be a real shame if they end up not being used for much.

This has been my thinking. Today you have to go out of your way to buy a system with an NPU, so I don't have any. But tomorrow, will they just be included by default? That seems like a waste for those of us who aren't going to be running models. I wonder what other uses they could be put to?

replies(6): >>41864427 #>>41864488 #>>41864879 #>>41865208 #>>41865384 #>>41870713 #

10. throwaway48476 ◴[16 Oct 24 22:19 UTC] No.41864414{4}[source]▶

>>41864304 #

Maybe through a game of telephone they confused die size and node size?

11. jsheard ◴[16 Oct 24 22:19 UTC] No.41864415{4}[source]▶

>>41864316 #

Doesn't SRAM have much lower power density than logic with the same area though? Hence why AMD can get away with physically stacking cache on top of more cache in their X3D parts, without the bottom layer melting.

replies(2): >>41864778 #>>41864937 #

12. jsheard ◴[16 Oct 24 22:21 UTC] No.41864427[source]▶

>>41864412 #

> But tomorrow, will they just be included by default?

That's already the way things are going due to Microsoft decreeing that Copilot+ is the future of Windows, so AMD and Intel are both putting NPUs which meet the Copilot+ performance standard into every consumer part they make going forwards to secure OEM sales.

replies(2): >>41864643 #>>41870035 #

13. thfuran ◴[16 Oct 24 22:25 UTC] No.41864462{4}[source]▶

>>41864304 #

Yes, they're making fun of the comment they replied to.

replies(1): >>41866239 #

14. jonas21 ◴[16 Oct 24 22:28 UTC] No.41864488[source]▶

>>41864412 #

NPUs are already included by default in the Apple ecosystem. Nobody seems to mind.

replies(3): >>41864549 #>>41864903 #>>41865200 #

15. tedunangst ◴[16 Oct 24 22:29 UTC] No.41864496{4}[source]▶

>>41864304 #

No, no, no, you just don't get it. The only thing Dell will sell me is a laptop 324mm wide, which is totally appalling, but if they offered me a laptop that's 320mm wide, I'd immediately buy it. In my line of work, which is totally serious business, every millimeter counts.

16. JohnFen ◴[16 Oct 24 22:36 UTC] No.41864549{3}[source]▶

>>41864488 #

It's not really a question of minding if it's there, unless its presence increases cost, anyway. It just seems a waste to let it go idle, so my mind wanders to what other use I could put that circuitry to.

17. AlexAndScripts ◴[16 Oct 24 22:47 UTC] No.41864643{3}[source]▶

>>41864427 #

It almost makes me want to find some use for them on my Linux box (not that is has an NPU), but I truly can't think of anything. Too small to run a meaningful LLM, and I'd want that in bursts anyway, I hate voice controls (at least with the current tech), and Recall sounds thoroughly useless. Could you do mediocre machine translation on it, perhaps? Local github copilot? An LLM that is purely used to build an abstract index of my notes in the background?

Actually, could they be used to make better AI in games? That'd be neat. A shooter character with some kind of organic tactics, or a Civilisation/Stellaris AI that doesn't suck.

replies(2): >>41867610 #>>41872169 #

18. _zoltan_ ◴[16 Oct 24 22:52 UTC] No.41864691{3}[source]▶

>>41864134 #

News flash: you're in the niche of the niche. People don't care about die size.

I'd be willing to bet that the amount of money they are missing out on is miniscule and is by far offset by people's money who care about other stuff. Like you know, performance and battery life, just to stick to your examples.

replies(1): >>41865605 #

19. Kon-Peki ◴[16 Oct 24 23:03 UTC] No.41864778{5}[source]▶

>>41864415 #

Yes, cache has a much lower power density and could have been a candidate for that space.

But I wasn’t on the design team and have no basis for second-guessing them. I’m just saying that cramming more performance CPU cores onto this die isn’t a realistic option.

20. crazygringo ◴[16 Oct 24 23:17 UTC] No.41864879[source]▶

>>41864412 #

Aren't they used for speech recognition -- for dictation? Also for FaceID.

They're useful for more things than just LLM's.

replies(1): >>41866451 #

21. acchow ◴[16 Oct 24 23:21 UTC] No.41864903{3}[source]▶

>>41864488 #

It enables many features on the phone that people like, all without sending your personal data to the cloud. Like searching your photos for "dog" or "receipt".

22. waveBidder ◴[16 Oct 24 23:25 UTC] No.41864921{3}[source]▶

>>41864134 #

your satire is off base enough that people don't understand it's satire.

replies(2): >>41865190 #>>41866598 #

23. wtallis ◴[16 Oct 24 23:28 UTC] No.41864937{5}[source]▶

>>41864415 #

The SRAM that AMD is stacking also has the benefit of being last-level cache, so it doesn't need to run at anywhere near the frequency and voltage that eg. L1 cache operates at.

24. mardifoufs ◴[17 Oct 24 00:08 UTC] No.41865168[source]▶

>>41863905 #

NPUs were a thing (and a very common one in mobile CPUs too) way before the LLM craze.

25. heavyset_go ◴[17 Oct 24 00:12 UTC] No.41865190{4}[source]▶

>>41864921 #

The Poe's Law means it's working.

26. shepherdjerred ◴[17 Oct 24 00:13 UTC] No.41865200{3}[source]▶

>>41864488 #

I actually love that Apple includes this — especially now that they’re actually doing something with it via Apple Intelligence

27. heavyset_go ◴[17 Oct 24 00:16 UTC] No.41865208[source]▶

>>41864412 #

The idea is that your OS and apps will integrate ML models, so you will be running models whether you know it or not.

replies(1): >>41866421 #

28. idunnoman1222 ◴[17 Oct 24 00:51 UTC] No.41865384[source]▶

>>41864412 #

Voice to text

29. kllrnohj ◴[17 Oct 24 01:07 UTC] No.41865466[source]▶

>>41863390 (TP) #

Snapdragon X still has a full 12 cores (all same cores, it's homogeneous) and the Strix Point is also 12 cores but in a 4+8 configuration but with the "little" cores not sacrificing that much (nothing like the little cores in ARM's designs which might as well not even exist, they are a complete waste of silicon). Consumer software doesn't scale to that, so what are you going to do with more transistors allocated to the CPU?

It's not unlike why Apple puts so many video engines in their SoCs - they don't actually have much else to do with the transistor budget they can afford. Making single thread performance better isn't limited by transistor count anymore and software is bad at multithreading.

replies(1): >>41865909 #

30. kalleboo ◴[17 Oct 24 01:30 UTC] No.41865589[source]▶

>>41863905 #

> most people have no use for that

Apple originally added their NPUs before the current LLM wave to support things like indexing your photo library so that objects and people are searchable. These features are still very popular. I don't think these NPUs are fast enough for GenAI anyway.

replies(2): >>41865887 #>>41865902 #

31. mattnewton ◴[17 Oct 24 01:33 UTC] No.41865605{4}[source]▶

>>41864691 #

That’s exactly what the poster is arguing- they are being sarcastic.

replies(1): >>41870152 #

32. jcgrillo ◴[17 Oct 24 01:43 UTC] No.41865651[source]▶

>>41863905 #

I just got an iphone and the whole photos thing is absolutely garbage. All I wanted to do was look through my damn photos and find one I took recently but it started playing some random music and organized them in no discernible order.. like it wasn't the reverse time sorted.. Idk what kind of fucked up "creative process" came up with that bullshit but I sure wish they'd unfuck it stat.

The camera is real good though.

replies(3): >>41866225 #>>41869893 #>>41871830 #

33. jcgrillo ◴[17 Oct 24 02:16 UTC] No.41865813[source]▶

>>41863880 #

Question--what's to be lost by making your features sufficiently not dense to allow them to cool at full tilt?

replies(2): >>41865917 #>>41866644 #

34. wmf ◴[17 Oct 24 02:31 UTC] No.41865887{3}[source]▶

>>41865589 #

MS Copilot and "Apple Intelligence" are running a small language model and image generation on the NPU so that should count as "GenAI".

replies(1): >>41866463 #

35. grugagag ◴[17 Oct 24 02:34 UTC] No.41865902{3}[source]▶

>>41865589 #

I wish I could turn that off on my phone.

36. wmf ◴[17 Oct 24 02:35 UTC] No.41865909[source]▶

>>41865466 #

GPU "infinity" cache would increase 3D performance and there's a rumor that AMD removed it to make room for the NPU. They're not out of ideas for features to put on the chip.

37. AlotOfReading ◴[17 Oct 24 02:36 UTC] No.41865917{3}[source]▶

>>41865813 #

Messes with timing, among other things. A lot of those structures are relatively fixed blocks that are designed for specific sizes. Signals take more time to propagate longer distances, and longer conductors have worse properties. Dense and hot is faster and more broadly useful.

replies(1): >>41866001 #

38. jcgrillo ◴[17 Oct 24 02:52 UTC] No.41866001{4}[source]▶

>>41865917 #

Interesting, so does that mean we're basically out of runway without aggressive cooling?

replies(1): >>41867074 #

39. james_marks ◴[17 Oct 24 03:43 UTC] No.41866225{3}[source]▶

>>41865651 #

There’s an album called “Recents” that’s chronological and scrolled to the end.

“Recent” seems to mean everything; I’ve got 6k+ photos, I think since the last fresh install, which is many devices ago.

Sounds like the view you’re looking for and will stick as the default once you find it, but you do have to bat away some BS at first.

40. singlepaynews ◴[17 Oct 24 03:47 UTC] No.41866239{5}[source]▶

>>41864462 #

Would you do me the favor of explaining the joke? I get the premise—nobody cares about die size, but the comment being mocked seems perfectly innocuous to me? They want a laptop without an NPU b/c according to link we get more out of CPU anyways? What am I missing here?

replies(1): >>41867686 #

41. fijiaarone ◴[17 Oct 24 03:50 UTC] No.41866254{3}[source]▶

>>41864134 #

Yeah, I know what you mean. I hate lugging around a big CPU core.

42. JohnFen ◴[17 Oct 24 04:28 UTC] No.41866421{3}[source]▶

>>41865208 #

I'm confident that I'll be able to know and control whether or not my Linux and BSD machines will be using ML models.

replies(2): >>41866482 #>>41875239 #

43. JohnFen ◴[17 Oct 24 04:38 UTC] No.41866451{3}[source]▶

>>41864879 #

Yes, but I'm not interested in those sorts of uses. I'm wondering what else an NPU could be used for. I don't know what an NPU actually is at a technical level, so I'm ignorant of the possibilities.

replies(1): >>41867894 #

44. kalleboo ◴[17 Oct 24 04:41 UTC] No.41866463{4}[source]▶

>>41865887 #

It's still in beta so we'll see how things go but I saw someone testing what Apple Intelligence ran on-device vs sent off to the "private secure cloud" and even stuff like text summaries were being sent to the cloud.

45. hollerith ◴[17 Oct 24 04:45 UTC] No.41866482{4}[source]▶

>>41866421 #

--and whether anyone is using your interactions with your computer to train a model.

replies(1): >>41875252 #

46. 0xDEAFBEAD ◴[17 Oct 24 05:10 UTC] No.41866598{4}[source]▶

>>41864921 #

Says a lot about HN that so many believed he was genuine.

47. positr0n ◴[17 Oct 24 05:19 UTC] No.41866644{3}[source]▶

>>41865813 #

Good discussion on how at multi GHz clock speeds, the speed of light is actually limiting on some circuit design choices: https://news.ycombinator.com/item?id=12384596

48. ezst ◴[17 Oct 24 06:19 UTC] No.41866907{3}[source]▶

>>41864134 #

I'm fine with the mockery, I genuinely hadn't realized that "wanting to pay for what one needs" was such a hot and controversial take.

replies(1): >>41867899 #

49. IshKebab ◴[17 Oct 24 06:26 UTC] No.41866947{4}[source]▶

>>41864316 #

Cache doesn't use nearly as much power as active computation; that was my point.

50. joha4270 ◴[17 Oct 24 06:53 UTC] No.41867074{5}[source]▶

>>41866001 #

No.

Every successive semiconductor node uses less power than the previous per transistor at the same clock speed. Its just that we then immediately use this headroom to pack more transistors closer and run them faster, so every chip keeps running into power limits, even if they continually do more with said power.

51. ywvcbk ◴[17 Oct 24 08:42 UTC] No.41867610{4}[source]▶

>>41864643 #

> box

Presumably you have a GPU? If so there is nothing an NPU can do that a discrete GPU can’t (and it would be much slower than a recent GPU).

The real benefits are power efficiency and cost since they are built into the SoC which are not necessarily that useful on a desktop PC.

52. michaelt ◴[17 Oct 24 08:57 UTC] No.41867686{6}[source]▶

>>41866239 #

It has been the norm for several decades to have hardware features that go unused.

The realities of mass manufacturing and supply chains and whatnot mean it's cheaper to get a laptop with a webcam I don't use, a fingerprint reader I don't use, and an SD card reader I don't use. It's cheaper to get a CPU with integrated graphics I don't use, a trusted execution environment I don't use, remote management features I don't use. It's cheaper to get a discrete GPU with RGB LEDs I don't use, directx support I don't use, four outputs when I only need one. It's cheaper to get a motherboard with integrated wifi than one without.

53. ItsBob ◴[17 Oct 24 09:35 UTC] No.41867894{4}[source]▶

>>41866451 #

I'm probably about to show my ignorance here (I'm not neck-deep in the AI space but I am a software architect...) but are they not just dedicated matrix multiplication engines (plus some other AI stuff)? So instead of asking the CPU to do the math, you have a dedicated area that does it instead... well, that's my understanding of it.

As to why, I think it's along the lines of this: the CPU does 100 things, one of those is AI acceleration. Let's take the AI acceleration and give it its own space instead so we can keep the power down a bit, add some specialization, and leave the CPU to do other stuff.

Again, I'm coming at this from a high-level as if explaining it to my ageing parents.

replies(1): >>41868347 #

54. ginko ◴[17 Oct 24 09:36 UTC] No.41867899{4}[source]▶

>>41866907 #

The extra cost of the area spent on npu cores is pretty hard to quantify. I guess removing it would allow for higher yields and number of chips per wafer but then you’d need to set up tooling for two separate runs (one with npu and one without) Add to that that most of the cost is actually the design of the chip and it’s clear why manufacturers just always add the extra features. Maybe they could sell a chip with the NPU permanently disabled but I guess that wouldn’t be what you want either?

Fwiw there should be no power downside to having an unused unit. It’ll just not be powered.

replies(1): >>41870106 #

55. JohnFen ◴[17 Oct 24 10:55 UTC] No.41868347{5}[source]▶

>>41867894 #

Yes, that's my understanding as well. What I meant is that I don't know the fine details. My ignorance is purely because I don't actually have a machine that has an NPU, so I haven't bothered to study up on them.

56. coldpie ◴[17 Oct 24 14:15 UTC] No.41869893{3}[source]▶

>>41865651 #

There is a chronological view tucked in there somewhere, but they really do hide it behind the other crap. Once you manage to get into it, it usually stays that way, but sometimes it kicks me back out to the random nonsense view and I have to take a few minutes to find chronological again.

57. bcoates ◴[17 Oct 24 14:31 UTC] No.41870035{3}[source]▶

>>41864427 #

Microsoft has declared a whole lot of things to be the future of Windows, almost all of them were quietly sidelined in a version or two.

https://www.joelonsoftware.com/2002/01/06/fire-and-motion/

replies(1): >>41871052 #

58. ezst ◴[17 Oct 24 14:39 UTC] No.41870106{5}[source]▶

>>41867899 #

The argument boils down to "since it's there, better to keep it because making a version without it would defeat economies of scale and not save much, if at all", and that's a sensible take… under the assumption that there's a general demand for NPUs, which I contest.

In practice, everyone is paying a premium for NPUs that only a minority desires, and only a fraction of that minority essentially does "something" with it.

This thread really helps to show that the use-cases are few, non-essential, and that the general application landscape hasn't adopted NPUs and has very little incentive to do so (because of the alien programming model, because of hardware compat across vendors, because of the ecosystem being a moving target with little stability in sight, and because of the high-effort/low-reward in general).

I do want to be wrong, of course. Tech generally is exciting because it offers new tools to crack old problems, opening new venues and opportunities in the process. Here it looks like we have a solution in search for a problem that was set by marketing departments.

replies(1): >>41872019 #

59. shermantanktop ◴[17 Oct 24 14:44 UTC] No.41870152{5}[source]▶

>>41865605 #

It whooshed over my head too. That’s the danger of sarcasm…it’s a cooperative form of humor but the other party might not get it.

60. VHRanger ◴[17 Oct 24 15:14 UTC] No.41870432{3}[source]▶

>>41864171 #

SRAM is extremely hot, it's the very opposite of dark silicon

61. consteval ◴[17 Oct 24 15:42 UTC] No.41870713[source]▶

>>41864412 #

We already can't fit much more in CPUs. You can't just throw cores in there. CPUs these days are, like, 80% cache if you look at the die. We constantly shrink the compute part, but we don't put much more compute - that space is just used for cache.

So, I'm not sure that you're wasting much with the NPU. But I'm not an expert.

62. jsheard ◴[17 Oct 24 16:17 UTC] No.41871052{4}[source]▶

>>41870035 #

Yeah, but the lead times on silicon mean we're going to be stuck with Microsoft's decision for while regardless of how hard they commit to it. AMD and Intel probably already have two or three future generations of Copilot+ CPUs in the pipeline.

63. nioj ◴[17 Oct 24 17:43 UTC] No.41871830{3}[source]▶

>>41865651 #

I think the music part is related to the setting called "Show Featured Content" in the Photos app settings.

replies(1): >>41873331 #

64. Miraste ◴[17 Oct 24 18:01 UTC] No.41872019{6}[source]▶

>>41870106 #

Modern SoCs already have all kinds of features with use-cases that are few and non-essential. Granted they don't take as much space as NPUS, but manufacturers are betting that if NPUs are available, software will evolve to use them regularly. If it doesn't, they'll probably go away in a few generations. But at a minimum, Microsoft and Apple seem highly committed to using them.

65. Miraste ◴[17 Oct 24 18:16 UTC] No.41872169{4}[source]▶

>>41864643 #

In short: no. Current-gen NPUs are so slow they can't do anything useful. AMD and Intel have 2nd-gen ones that came out a few weeks ago, and by spec they may able to run local translation and small LLMs (haven't seen benchmarks yet), but for now they are laptop-only.

66. jcgrillo ◴[17 Oct 24 20:18 UTC] No.41873331{4}[source]▶

>>41871830 #

Yeah that's how I ended up making it stop but if this is what these NPUs are being used for just.. why?

67. barfingclouds ◴[17 Oct 24 23:58 UTC] No.41875051[source]▶

>>41863905 #

Agreed. I use perplexity and chat gpt daily, but if I understand correctly that’s all done off device which is fine with me. I don’t need to generate weird images, or have my phone rewrite emails for me, or summarize stuff.

68. heavyset_go ◴[18 Oct 24 00:31 UTC] No.41875239{4}[source]▶

>>41866421 #

I agree with the premise as a Linux user myself, but if you're using any JetBrains products, or Zoom, you're running models on the client-side. I suspect small models will continue to creep into apps. Even Firefox ships ML models in the browser.

69. heavyset_go ◴[18 Oct 24 00:32 UTC] No.41875252{5}[source]▶

>>41866482 #

Luckily, while NPUs do nothing about data exfiltration, they're a poor solution for training models. Your data is still going to get sucked up to the mothership, but offloading training to your machine hopefully won't happen.

replies(1): >>41879416 #

70. hollerith ◴[18 Oct 24 13:43 UTC] No.41879416{6}[source]▶

>>41875252 #

Yes, when I was writing my comment, I was imagining my user-interaction data getting sucked up to data centers.

↑