/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Gemma 3 QAT Models: Bringing AI to Consumer GPUs
(developers.googleblog.com)
602 points
emrah
| 5 comments |
20 Apr 25 12:22 UTC
|
HN request time: 0.408s
|
source
1.
wtcactus
◴[
20 Apr 25 13:33 UTC
]
No.
43743666
[source]
▶
>>43743337 (OP)
#
They keep mentioning the RTX 3090 (with 24 GB VRAM), but the model is only 14.1 GB.
Shouldn’t it fit a 5060 Ti 16GB, for instance?
replies(3):
>>43743691
#
>>43743768
#
>>43747505
#
ID:
GO
2.
jsnell
◴[
20 Apr 25 13:37 UTC
]
No.
43743691
[source]
▶
>>43743666 (TP)
#
Memory is needed for more than just the parameters, e.g. the KV cache.
replies(1):
>>43743879
#
3.
oktoberpaard
◴[
20 Apr 25 13:52 UTC
]
No.
43743768
[source]
▶
>>43743666 (TP)
#
With a 128K context length and 8 bit KV cache, the 27b model occupies 22 GiB on my system. With a smaller context length you should be able to fit it on a 16 GiB GPU.
4.
cubefox
◴[
20 Apr 25 14:12 UTC
]
No.
43743879
[source]
▶
>>43743691
#
KV = key-value
5.
Havoc
◴[
21 Apr 25 00:09 UTC
]
No.
43747505
[source]
▶
>>43743666 (TP)
#
Just checked - 19 gigs with 8k context @ q8 kv.Plus another 2.5-ish or so for OS etc.
...so yeah 3090
↑