←back to thread

486 points dbreunig | 1 comments | | HN request time: 0s | source
Show context
cjbgkagh ◴[] No.41865626[source]
> We've tried to avoid that by making both the input matrices more square, so that tiling and reuse should be possible.

While it might be possible it would not surprise me if a number of possible optimizations had not made it into Onnx. It appears that Qualcomm does not give direct access to the NPU and users are expected to use frameworks to convert models over to it, and in my experience conversion tools generally suck and leave a lot of optimizations on the table. It could be less of NPUs suck and more of the conversions tools suck. I'll wait until I get direct access - I don't trust conversion tools.

My view of NPUs is that they're great for tiny ML models and very fast function approximations which is my intended use case. While LLMs are the new hotness there are huge number of specialized tasks that small models are really useful for.

replies(2): >>41865847 #>>41868939 #
Hizonner ◴[] No.41868939[source]
> While LLMs are the new hotness there are huge number of specialized tasks that small models are really useful for.

Can you give some examples? Preferably examples that will run continuously enough for even a small model to stay in cache, and are valuable enough to a significant number of users to justify that cache footprint?

I am not saying there aren't any, but I also honestly don't know what they are and would like to.

replies(2): >>41870791 #>>41871252 #
1. consteval ◴[] No.41870791[source]
iPhones use a lot of these. There's a bunch of little features that run on the NPU.

Suggestions, predictive text, smart image search, automatic image classification, text selection in images, image processing. These don't run continuously, but I think they are valuable to a lot of users. The predictive text is quite good, and it's very nice to be able to search for vague terms like "license plate" and get images in my camera roll. Plus, selecting text and copying it from images is great.

For desktop usecases, I'm not sure.