Most active commenters

nico(3)
zackify(3)

Devstral

(mistral.ai)

Show context

simonw ◴[21 May 25 17:30 UTC] No.44053886[source]▶

The first number I look at these days is the file size via Ollama, which for this model is 14GB https://ollama.com/library/devstral/tags

I find that on my M2 Mac that number is a rough approximation to how much memory the model needs (usually plus about 10%) - which matters because I want to know how much RAM I will have left for running other applications.

Anything below 20GB tends not to interfere with the other stuff I'm running too much. This model looks promising!

replies(4): >>44054806 #>>44056502 #>>44059216 #>>44059888 #

1. nico ◴[21 May 25 21:32 UTC] No.44056502[source]▶

>>44053886 #

Any agentic dev software you could recommend that runs well with local models?

I’ve been using Cursor and I’m kind of disappointed. I get better results just going back and forth between the editor and ChatGPT

I tried localforge and aider, but they are kinda slow with local models

replies(6): >>44056637 #>>44057592 #>>44058473 #>>44059316 #>>44064049 #>>44071582 #

2. jabroni_salad ◴[21 May 25 21:50 UTC] No.44056637[source]▶

>>44056502 (TP) #

Do you have any other interface for the model? what kind of tokens/sec are you getting?

Try hooking aider up to gemini and see how the speed is. I have noticed that people in the localllama scene do not like to talk about their TPS.

replies(2): >>44056857 #>>44081068 #

3. nico ◴[21 May 25 22:20 UTC] No.44056857[source]▶

>>44056637 #

The models feel pretty snappy when interacting with them directly via ollama, not sure about the TPS

However I've also ran into 2 things: 1) most models don't support tools, sometimes it's hard to find a version of the model that correctly uses tools, 2) even with good TPS, since the agents are usually doing chain-of-thought and running multiple chained prompts, the experience feels slow - this is even true with Cursor using their models/apis

4. ynniv ◴[22 May 25 00:28 UTC] No.44057592[source]▶

>>44056502 (TP) #

https://github.com/block/goose

5. zackify ◴[22 May 25 03:26 UTC] No.44058473[source]▶

>>44056502 (TP) #

I used devstral today with cline and open hands. Worked great in both.

About 1 minute initial prompt processing time on an m4 max

Using LM studio because the ollama api breaks if you set the context to 128k.

replies(2): >>44060526 #>>44062026 #

6. asimovDev ◴[22 May 25 06:28 UTC] No.44059316[source]▶

>>44056502 (TP) #

you can use ollama in VS Code's copilot. I haven't personally tried it but I am interested in how it would perform with devstral

7. elAhmo ◴[22 May 25 10:08 UTC] No.44060526[source]▶

>>44058473 #

How is it great that it takes 1 minute for initial prompt processing?

replies(2): >>44080572 #>>44112119 #

8. nico ◴[22 May 25 13:47 UTC] No.44062026[source]▶

>>44058473 #

Have you tried using mlx or Simon Wilson’s llm?

https://llm.datasette.io/en/stable/

https://simonwillison.net/tags/llm/

replies(1): >>44112112 #

9. mrshu ◴[22 May 25 17:00 UTC] No.44064049[source]▶

>>44056502 (TP) #

ra-aid works pretty well with Ollama (haven't tried it with Devstral yet though)

https://docs.ra-aid.ai/configuration/ollama/

10. ivanvanderbyl ◴[23 May 25 10:35 UTC] No.44071582[source]▶

>>44056502 (TP) #

I’ve been playing around with Zed, supports local and cloud models, really fast, nice UX. It does lack some of the deeper features of VSCode/Cursor but very capable.

11. cheema33 ◴[24 May 25 12:12 UTC] No.44080572{3}[source]▶

>>44060526 #

That time is just for the very first prompt. It is basically the startup time for the model. Once it is loaded, it is much much faster in responding to your queries. Depending on your hardware of course.

12. segmondy ◴[24 May 25 13:42 UTC] No.44081068[source]▶

>>44056637 #

People have all sorts of hardware, TPS is meaningless without the full spec of the hardware, and GPU is not the only thing, CPU, ram speed, memory channel, PCIe speed, inference software, partial CPU offload? RPC? even OS, all of these things add up. So if someone tells you TPS for a given model, it's meaningless unless you understand their entire setup.

13. zackify ◴[28 May 25 01:44 UTC] No.44112112{3}[source]▶

>>44062026 #

On lm studio I was using mlx

14. zackify ◴[28 May 25 01:45 UTC] No.44112119{3}[source]▶

>>44060526 #

Haha great as in surprisingly good at some simple things that nothing has been able to do locally for me.

The 1 minute first token sucks and has me dreaming for the day of 3-4x the bandwidth

↑