←back to thread

602 points emrah | 2 comments | | HN request time: 0.001s | source
Show context
simonw ◴[] No.43743896[source]
I think gemma-3-27b-it-qat-4bit is my new favorite local model - or at least it's right up there with Mistral Small 3.1 24B.

I've been trying it on an M2 64GB via both Ollama and MLX. It's very, very good, and it only uses ~22Gb (via Ollama) or ~15GB (MLX) leaving plenty of memory for running other apps.

Some notes here: https://simonwillison.net/2025/Apr/19/gemma-3-qat-models/

Last night I had it write me a complete plugin for my LLM tool like this:

  llm install llm-mlx
  llm mlx download-model mlx-community/gemma-3-27b-it-qat-4bit

  llm -m mlx-community/gemma-3-27b-it-qat-4bit \
    -f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
    -f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
    -s 'Write a new fragments plugin in Python that registers
    issue:org/repo/123 which fetches that issue
        number from the specified github repo and uses the same
        markdown logic as the HTML page to turn that into a
        fragment'
It gave a solid response! https://gist.github.com/simonw/feccff6ce3254556b848c27333f52... - more notes here: https://simonwillison.net/2025/Apr/20/llm-fragments-github/
replies(11): >>43743949 #>>43744205 #>>43744215 #>>43745256 #>>43745751 #>>43746252 #>>43746789 #>>43747326 #>>43747968 #>>43752580 #>>43752951 #
nico ◴[] No.43744205[source]
Been super impressed with local models on mac. Love that the gemma models have 128k token context input size. However, outputs are usually pretty short

Any tips on generating long output? Like multiple pages of a document, a story, a play or even a book?

replies(3): >>43744252 #>>43744469 #>>43747471 #
1. Casteil ◴[] No.43744469[source]
This is basically the opposite of what I've experienced - at least compared to another recent entry like IBM's Granite 3.3.

By comparison, Gemma3's output (both 12b and 27b) seems to typically be more long/verbose, but not problematically so.

replies(1): >>43745278 #
2. nico ◴[] No.43745278[source]
I agree with you. The outputs are usually good, it’s just that for the use case I have now (writing several pages of long dialogs), the output is not as long as I’d want it, and definitely not as long as it’s supposedly capable of doing