←back to thread

602 points emrah | 1 comments | | HN request time: 0.416s | source
Show context
justanotheratom ◴[] No.43743956[source]
Anyone packaged one of these in an iPhone App? I am sure it is doable, but I am curious what tokens/sec is possible these days. I would love to ship "private" AI Apps if we can get reasonable tokens/sec.
replies(4): >>43743983 #>>43744244 #>>43744274 #>>43744863 #
nico ◴[] No.43744244[source]
What kind of functionality do you need from the model?

For basic conversation and RAG, you can use tinyllama or qwen-2.5-0.5b, both of which run on a raspberry pi at around 5-20 tokens per second

replies(1): >>43748252 #
justanotheratom ◴[] No.43748252[source]
I am looking for structured output at about 100-200 tokens/second on iPhone 14+. Any pointers?
replies(1): >>43786330 #
1. nico ◴[] No.43786330[source]
The qwq-2.5-0.5b is the tiniest useful model I've used, and pretty easy to fine-tune locally on a Mac. Haven't tried it on an iPhone, but given it runs at about 150-200 tokens/second on a Mac, I'm kinda doubtful it could do the same on an iPhone. But I guess you'd just have to try