Gemma 3 QAT Models: Bringing AI to Consumer GPUs

(developers.googleblog.com)

602 points emrah | 3 comments | 20 Apr 25 12:22 UTC | HN request time: 0.036s | source

Show context

perching_aix ◴[20 Apr 25 15:19 UTC] No.43744332[source]▶

This is my first time trying to locally host a model - gave both the 12B and 27B QAT models a shot.

I was both impressed and disappointed. Setup was piss easy, and the models are great conversationalists. I have a 12 gig card available and the 12B model ran very nice and swift.

However, they're seemingly terrible at actually assisting with stuff. Tried something very basic: asked for a powershell one liner to get the native blocksize of my disks. Ended up hallucinating fields, then telling me to go off into the deep end, first elevating to admin, then using WMI, then bringing up IOCTL. Pretty unfortunate. Not sure I'll be able to put it to actual meaningful use as a result.

replies(4): >>43744568 #>>43744683 #>>43747309 #>>43748148 #

1. terhechte ◴[20 Apr 25 16:12 UTC] No.43744683[source]▶

>>43744332 #

Local models, due to their size more than big cloud models, favor popular languages rather than more niche ones. They work fantastic for JavaScript, Python, Bash but much worse at less popular things like Clojure, Nim or Haskell. Powershell is probably on the less popular side compared to Js or Bash.

If this is your main use case you can always try to fine tune a model. I maintain a small llm bench of different programming languages and the performance difference between say Python and Rust on some smaller models is up to 70%

replies(1): >>43744769 #

2. perching_aix ◴[20 Apr 25 16:27 UTC] No.43744769[source]▶

>>43744683 (TP) #

How accessible and viable is model fine-tuning? I'm not in the loop at all unfortunately.

replies(1): >>43752517 #

3. terhechte ◴[21 Apr 25 14:39 UTC] No.43752517[source]▶

>>43744769 #

This is a very accessible way of playing around with the topic: https://transformerlab.ai

↑