Devstral

(mistral.ai)

701 points mfiguiere | 1 comments | 21 May 25 14:21 UTC | HN request time: 0.265s | source

Show context

ics ◴[21 May 25 17:40 UTC] No.44054028[source]▶

Maybe someone here can suggest tools or at least where to look; what are the state-of-the-art models to run locally on relatively low power machines like a MacBook Air? Is there anyone tracking what is feasible given a machine spec?

"Apple Intelligence" isn't it but it would be nice to know without churning through tests whether I should bother keeping around 2-3 models for specific tasks in ollama or if their performance is marginal there's a more stable all-rounder model.

replies(3): >>44054653 #>>44056458 #>>44058187 #

Miraste ◴[21 May 25 21:26 UTC] No.44056458[source]▶

>>44054028 #

The best general model you can run locally is probably some version of Gemma 3 or the latest Mistral Small. On a Windows machine, this is limited by VRAM, since system RAM is too low-bandwidth to run models at usable speeds. On an M-series Mac, the system memory is on-die and fast enough to use. What you can run will be the total RAM, minus whatever MacOS uses and the space you want for other programs.

To determine how much space a model needs, you look at the size of the quantized (lower precision) model on HuggingFace or wherever it's hosted. Q4_K_M is a good default. As a rough rule of thumb, this will be a little over half the size of the parameters, if they were in gigabytes. For Devstral, that's 14.3GB. You will also need 1-8GB more than that, to store the context.

For example: A 32GB Macbook Air could use Devstral at 14.3+4GB, leaving ~14GB for the system and applications. A 16GB Macbook Air could use Gemma 3 12B at 7.3+2GB, leaving ~7GB for everything else. An 8GB Macbook could use Gemma 3 4B at 2.5GB+1GB, but this is probably not worth doing.

replies(1): >>44059545 #

1. visarga ◴[22 May 25 07:15 UTC] No.44059545[source]▶

>>44056458 #

> An 8GB Macbook could use Gemma 3 4B at 2.5GB+1GB, but this is probably not worth doing.

I am currently using this model on a Macbook with 16GB ram, it is hooked up with a chrome extension that extracts text from webpages and logs to a file, then summarizes each page. I want to develop an episodic memory system, like MS Recall, but local, it does not leak my data to anyone else, and costs me nothing.

Gemma 3 4B runs under ollama and is light enough that I don't feel it while browsing. Summarization happens in the background. This page I am on is already logged and summarized.

↑