(developers.googleblog.com)

602 points emrah | 2 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

Show context

justanotheratom ◴[20 Apr 25 14:23 UTC] No.43743956[source]▶

Anyone packaged one of these in an iPhone App? I am sure it is doable, but I am curious what tokens/sec is possible these days. I would love to ship "private" AI Apps if we can get reasonable tokens/sec.

replies(4): >>43743983 #>>43744244 #>>43744274 #>>43744863 #

1. nolist_policy ◴[20 Apr 25 16:44 UTC] No.43744863[source]▶

>>43743956 #

FWIW, I can run Gemma-3-12b-it-qat on my Galaxy Fold 4 with 12Gb ram at around 1.5 tokens / s. I use plain llama.cpp with Termux.

replies(1): >>43745150 #

2. Casteil ◴[20 Apr 25 17:31 UTC] No.43745150[source]▶

>>43744863 (TP) #

Does this turn your phone into a personal space heater too?

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs