Anyone packaged one of these in an iPhone App? I am sure it is doable, but I am curious what tokens/sec is possible these days. I would love to ship "private" AI Apps if we can get reasonable tokens/sec.
The qwq-2.5-0.5b is the tiniest useful model I've used, and pretty easy to fine-tune locally on a Mac. Haven't tried it on an iPhone, but given it runs at about 150-200 tokens/second on a Mac, I'm kinda doubtful it could do the same on an iPhone. But I guess you'd just have to try