(developers.googleblog.com)

602 points emrah | 1 comments | 20 Apr 25 12:22 UTC | HN request time: 0s | source

Show context

wtcactus ◴[20 Apr 25 13:33 UTC] No.43743666[source]▶

>>43743337 (OP) #

They keep mentioning the RTX 3090 (with 24 GB VRAM), but the model is only 14.1 GB.

Shouldn’t it fit a 5060 Ti 16GB, for instance?

replies(3): >>43743691 #>>43743768 #>>43747505 #

1. oktoberpaard ◴[20 Apr 25 13:52 UTC] No.43743768[source]▶

>>43743666 #

With a 128K context length and 8 bit KV cache, the 27b model occupies 22 GiB on my system. With a smaller context length you should be able to fit it on a 16 GiB GPU.

↑

Gemma 3 QAT Models: Bringing AI to Consumer GPUs