←back to thread

602 points emrah | 2 comments | | HN request time: 0.41s | source
Show context
justanotheratom ◴[] No.43743956[source]
Anyone packaged one of these in an iPhone App? I am sure it is doable, but I am curious what tokens/sec is possible these days. I would love to ship "private" AI Apps if we can get reasonable tokens/sec.
replies(4): >>43743983 #>>43744244 #>>43744274 #>>43744863 #
zamadatix ◴[] No.43744274[source]
There are many such apps, e.g. Mollama, Enclave AI or PrivateLLM or dozens of others, but you could tell me it runs at 1,000,000 tokens/second on an iPhone and I wouldn't care because the largest model version you're going to be able to load is Gemma 3 4B q4 (12 B won't fit in 8 GB with the OS + you still need context) and it's just not worth the time to use.

That said, if you really care, it generates faster than reading speed (on an A18 based model at least).

replies(1): >>43744535 #
1. woodson ◴[] No.43744535[source]
Some of these small models still have their uses, e.g. for summarization. Don’t expect them to fully replace ChatGPT.
replies(1): >>43744829 #
2. zamadatix ◴[] No.43744829[source]
The use case is more "I'm willing to have really bad answers that have extremely high rates of making things up" than based on the application. The same goes for summarization, it's not like it does it well like a large model would.