Most active commenters
  • boomskats(6)
  • almostgotcaught(4)
  • eightysixfour(3)

←back to thread

486 points dbreunig | 23 comments | | HN request time: 1.621s | source | bottom
Show context
eightysixfour ◴[] No.41863546[source]
I thought the purpose of these things was not to be fast, but to be able to run small models with very little power usage? I have a newer AMD laptop with an NPU, and my power usage doesn't change using the video effects that supposedly run on it, but goes up when using the nvidia studio effects.

It seems like the NPUs are for very optimized models that do small tasks, like eye contact, background blur, autocorrect models, transcription, and OCR. In particular, on Windows, I assumed they were running the full screen OCR (and maybe embeddings for search) for the rewind feature.

replies(7): >>41863632 #>>41863779 #>>41863821 #>>41863886 #>>41864628 #>>41864828 #>>41869772 #
1. boomskats ◴[] No.41863779[source]
That's especially true because yours is a Xilinx FPGA. The one that they just attached to the latest gen mobile ryzens is 5x more capable too.

AMD are doing some fantastic work at the moment, they just don't seem to be shouting about it. This one is particularly interesting https://lore.kernel.org/lkml/DM6PR12MB3993D5ECA50B27682AEBE1...

edit: not an FPGA. TIL. :'(

replies(5): >>41863852 #>>41863876 #>>41864048 #>>41864435 #>>41865733 #
2. errantspark ◴[] No.41863852[source]
Wait sorry back up a bit here. I can buy a laptop that has a daughter FPGA in it? Does it have GPIO??? Are we seriously building hardware worth buying again in 2024? Do you have a link?
replies(2): >>41863959 #>>41864293 #
3. beeflet ◴[] No.41863876[source]
It would be cool if most PCs had a general purpose FPGA that could be repurposed by the operating system. For example you could use it as a security processor like a TPM or as a bootrom, or you could repurpose it for DSP or something.

It just seems like this would be better in terms of firmware/security/bootloading because you would be more able to fix it if an exploit gets discovered, and it would be leaner because different operating systems can implement their own stuff (for example linux might not want pluton in-chip security, windows might not want coreboot or linux-based boot, bare metal applications can have much simpler boot).

replies(1): >>41864617 #
4. eightysixfour ◴[] No.41863959[source]
It isn't as fun as you think - they are setup for specific use cases and quite small. Here's a link to the software page: https://ryzenai.docs.amd.com/en/latest/index.html

The teeny-tiny "NPU," which is actually an FPGA, is 10 TOPS.

Edit: I've been corrected, not an FPGA, just an IP block from Xilinx.

replies(2): >>41864036 #>>41864062 #
5. wtallis ◴[] No.41864036{3}[source]
It's not a FPGA. It's an NPU IP block from the Xilinx side of the company. It was presumably originally developed to be run on a Xilinx FPGA, but that doesn't mean AMD did the stupid thing and actually fabbed a FPGA fabric instead of properly synthesizing the design for their laptop ASIC. Xilinx involvement does not automatically mean it's an FPGA.
replies(2): >>41864064 #>>41864111 #
6. pclmulqdq ◴[] No.41864048[source]
It's not an FPGA. It's a VLIW DSP that Xilinx built to go into an FPGA-SoC to help run ML models.
replies(1): >>41864242 #
7. boomskats ◴[] No.41864062{3}[source]
Yes, the one on the ryzen 7000 chips like the 7840u isn't massive, but that's the last gen model. The one they've just released with the HX370 chip is estimated at 50 TOPS, which is better than Qualcomm's ARM flagship that this post is about. It's a fivefold improvement in a single generation, it's pretty exciting.

A̵n̵d̵ ̵i̵t̵'̵s̵ ̵a̵n̵ ̵F̵P̵G̵A̵ It's not an FPGA

replies(1): >>41864248 #
8. eightysixfour ◴[] No.41864064{4}[source]
Thanks for the correction, edited.
9. boomskats ◴[] No.41864111{4}[source]
Do you have any more reading on this? How come the XDNA drivers depend on Xilinx' XRT runtime?
replies(2): >>41864232 #>>41864296 #
10. almostgotcaught ◴[] No.41864232{5}[source]
because XRT has a plugin architecture: XRT<-shim plugin<-kernel driver. The shims register themselves with XRT. The XDNA driver repo houses both the shim and the kernel driver.
replies(1): >>41864611 #
11. almostgotcaught ◴[] No.41864242[source]
this is the correct answer. one of the compilers for this DSP is https://github.com/Xilinx/llvm-aie.
12. almostgotcaught ◴[] No.41864248{4}[source]
> And it's an FPGA.

nope it's not.

replies(1): >>41864925 #
13. dekhn ◴[] No.41864293[source]
If you want GPIOs, you don't need (or want) an FPGA.

I don't know the details of your use case, but I work with low level hardware driven by GPIOs and after a bit of investigation, concluded that having direect GPIO access in a modern PC was not necessary or desirable compared to the alternatives.

replies(1): >>41866390 #
14. wtallis ◴[] No.41864296{5}[source]
It would be surprising and strange if AMD didn't reuse the software framework they've already built for doing AI when that IP block is instantiated on an FPGA fabric rather than hardened in an ASIC.
replies(1): >>41864630 #
15. numpad0 ◴[] No.41864435[source]
Sorry for an OT comment but what is going on with that ascii art!? The content fits within 80 columns just fine[1], is it GPT generated?

1: https://pastebin.com/raw/R9BrqETR

16. boomskats ◴[] No.41864611{6}[source]
Thanks, that makes sense.
17. walterbell ◴[] No.41864617[source]
Xilinx Artix 7-series PicoEVB fits in M.2 wifi slot and has an OSS toolchain, http://www.enjoy-digital.fr/
18. boomskats ◴[] No.41864630{6}[source]
Well, I'm irrationally disappointed, but thanks. Appreciate the correction.
19. boomskats ◴[] No.41864925{5}[source]
I've just ordered myself a jump to conclusions mat.
replies(1): >>41865072 #
20. almostgotcaught ◴[] No.41865072{6}[source]
Lol during grad school my advisor would frequently cut me off and try to jump to a conclusion, while I was explaining something technical often enough he was wrong. So I did really buy him one (off eBay or something). He wasn't pleased.
21. davemp ◴[] No.41865733[source]
Unfortunately FPGA fabric is ~2x less power efficient than equivalent ASIC logic at the same clock speeds last time I checked. So implementing general purpose logic on an FPGA is not usually the right option even if you don’t care about FMAX or transistor counts.
22. errantspark ◴[] No.41866390{3}[source]
I get a lot of use out of the PRUs on the BeagleboneBlack, I would absolutely get use out of an FPGA in a laptop.
replies(1): >>41866503 #
23. dekhn ◴[] No.41866503{4}[source]
It makes more sense to me to just use the BeagleboneBlack in concert with the FPGA. Unless you have highly specific compute or data movement needs that can't be satisfied over a USB serial link. If you have those needs, and you need a laptop, I guess an FPGA makes sense but that's a teeny market.