Gemma 3 QAT Models: Bringing AI to Consumer GPUs

1. mark_l_watson ◴[20 Apr 25 19:00 UTC] No.43745755[source]▶

Indeed!! I have swapped out qwen2.5 for gemma3:27b-it-qat using Ollama for routine work on my 32G memory Mac.

gemma3:27b-it-qat with open-codex, running locally, is just amazingly useful, not only for Python dev, but for Haskell and Common Lisp also.

I still like Gemini 2.5 Pro and o3 for brainstorming or working on difficult problems, but for routine work it (simply) makes me feel good to have everything open source/weights running on my own system.

Wen I bought my 32G Mac a year ago, I didn't expect to be so happy as running gemma3:27b-it-qat with open-codex locally.

replies(3): >>43750006 #>>43750021 #>>43750815 #

2. Tsarp ◴[21 Apr 25 09:41 UTC] No.43750006[source]▶

>>43745755 (TP) #

What tps are you hitting? And did you have to change KV size?

3. nxobject ◴[21 Apr 25 09:42 UTC] No.43750021[source]▶

>>43745755 (TP) #

Fellow owner of a 32GB MBP here: how much memory does it use while resident - or, if swapping happens, do you see the effects in your day to day work? I’m in the awkward position of using on a daily basis a lot of virtualized bloated Windows software (mostly SAS).

replies(1): >>43751621 #

4. pantulis ◴[21 Apr 25 11:44 UTC] No.43750815[source]▶

>>43745755 (TP) #

How did you manage to run open-codex against a local ollama? I keep getting 400 Errors no matter what I try with the --provider and --model options.

replies(1): >>43751321 #

5. pantulis ◴[21 Apr 25 12:39 UTC] No.43751321[source]▶

>>43750815 #

Never mind, found your Leanpub book and followed the instructions and at least I have it running with qwen-2.5. I'll investigate what happens with Gemma.

6. mark_l_watson ◴[21 Apr 25 13:10 UTC] No.43751621[source]▶

>>43750021 #

I have the usual programs running on my Mac, along with open-codex: Emacs, web browser, terminals, VSCode, etc. Even with large contexts, open-codex with Ollama and Gemma 3 27B QAT does not seem to overload my system.

To be clear, I sometimes toggle open-codex to use the Gemini 3.5 Pro API also, but I enjoy running locally for simpler routine work.