Anyone packaged one of these in an iPhone App? I am sure it is doable, but I am curious what tokens/sec is possible these days. I would love to ship "private" AI Apps if we can get reasonable tokens/sec.
replies(4):
For basic conversation and RAG, you can use tinyllama or qwen-2.5-0.5b, both of which run on a raspberry pi at around 5-20 tokens per second