/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Gemma 3 QAT Models: Bringing AI to Consumer GPUs
(developers.googleblog.com)
602 points
emrah
| 2 comments |
20 Apr 25 12:22 UTC
|
HN request time: 0.508s
|
source
Show context
justanotheratom
◴[
20 Apr 25 14:23 UTC
]
No.
43743956
[source]
▶
>>43743337 (OP)
#
Anyone packaged one of these in an iPhone App? I am sure it is doable, but I am curious what tokens/sec is possible these days. I would love to ship "private" AI Apps if we can get reasonable tokens/sec.
replies(4):
>>43743983
#
>>43744244
#
>>43744274
#
>>43744863
#
1.
nolist_policy
◴[
20 Apr 25 16:44 UTC
]
No.
43744863
[source]
▶
>>43743956
#
FWIW, I can run Gemma-3-12b-it-qat on my Galaxy Fold 4 with 12Gb ram at around 1.5 tokens / s. I use plain llama.cpp with Termux.
replies(1):
>>43745150
#
ID:
GO
2.
Casteil
◴[
20 Apr 25 17:31 UTC
]
No.
43745150
[source]
▶
>>43744863 (TP)
#
Does this turn your phone into a personal space heater too?
↑