Devstral | slacker news

1. oofbaroomf ◴[21 May 25 18:19 UTC] No.44054477[source]▶

The SWE-Bench scores are very, very high for an open source model of this size. 46.8% is better than o3-mini (with Agentless-lite) and Claude 3.6 (with AutoCodeRover), but it is a little lower than Claude 3.6 with Anthropic's proprietary scaffold. And considering you can run this for almost free, this is a very extraordinary model.

replies(3): >>44056216 #>>44056570 #>>44058287 #

2. falcor84 ◴[21 May 25 20:53 UTC] No.44056216[source]▶

>>44054477 (TP) #

Just to confirm, are you referring to Claude 3.7?

replies(1): >>44056250 #

3. oofbaroomf ◴[21 May 25 20:56 UTC] No.44056250[source]▶

>>44056216 #

No. I am referring to Claude 3.5 Sonnet New, released October 22, 2024, with model ID claude-3-5-sonnet-20241022, colloquially referred to as Claude 3.6 Sonnet because of Anthropic's confusing naming.

replies(4): >>44056271 #>>44056382 #>>44056760 #>>44061050 #

4. SkyPuncher ◴[21 May 25 20:59 UTC] No.44056271{3}[source]▶

>>44056250 #

> colloquially referred to as Claude 3.6

Interesting. I've never heard this.

replies(2): >>44057177 #>>44060064 #

5. Deathmax ◴[21 May 25 21:13 UTC] No.44056382{3}[source]▶

>>44056250 #

Also known as Claude 3.5 Sonnet V2 on AWS Bedrock and GCP Vertex AI

6. AstroBen ◴[21 May 25 21:41 UTC] No.44056570[source]▶

>>44054477 (TP) #

extraordinary.. or suspicious that the benchmarks aren't doing their job

replies(1): >>44057569 #

7. ttoinou ◴[21 May 25 22:07 UTC] No.44056760{3}[source]▶

>>44056250 #

And it is a very good LLM. Some people complain they don't see an improvement with Sonnet 3.7

8. simonw ◴[21 May 25 23:09 UTC] No.44057177{4}[source]▶

>>44056271 #

It's the reason Anthropic called their next release 3.7 Sonnet - the 3.6 version number was already being used by some in the community to refer to their 3.5v2.

9. echelon ◴[22 May 25 00:25 UTC] No.44057569[source]▶

>>44056570 #

I wasn't considering Mistral for anything, but this show of goodwill to open source is amazing. I'll have to give this a try.

replies(1): >>44062135 #

10. sagarpatil ◴[22 May 25 02:44 UTC] No.44058287[source]▶

>>44054477 (TP) #

They are referring to SWE bench lite. Just want to make sure you are too.

replies(1): >>44060996 #

11. turing_complete ◴[22 May 25 08:40 UTC] No.44060064{4}[source]▶

>>44056271 #

because nobody says that

replies(2): >>44061275 #>>44062804 #

12. svantana ◴[22 May 25 11:32 UTC] No.44060996[source]▶

>>44058287 #

Where did you get that idea? In the post they are repeatedly referring to SWEBench-Verified and nothing else.

replies(1): >>44093810 #

13. moffkalast ◴[22 May 25 11:42 UTC] No.44061050{3}[source]▶

>>44056250 #

The model formerly known as Claude 3.6 Sonnet?

14. NiloCK ◴[22 May 25 12:13 UTC] No.44061275{5}[source]▶

>>44060064 #

Anthropic moved from 3.5, to 3.5(new), to 3.7. They skipped 3.6 because of usage in the community, and because 3.5(newer) probably passed some threshold of awfulness.

People also use 3.5.1 to refer to 3.5(new)/3.6.

The remaining difficulty now is when people refer to 3.5, without specifying (new) or (old). I find most unspecified references to 3.5 these days are actually to 3.6 / 3.5.1 / 3.5(new), which is confusing.

15. qeternity ◴[22 May 25 13:59 UTC] No.44062135{3}[source]▶

>>44057569 #

Mistral have a long history of open weight models...

replies(1): >>44070156 #

16. skerit ◴[22 May 25 15:03 UTC] No.44062804{5}[source]▶

>>44060064 #

That's not correct. I have always referred to it as v3.6, and I've seen plenty of other people do so too. It's why their next model was called v3.7

17. alhimik45 ◴[23 May 25 05:37 UTC] No.44070156{4}[source]▶

>>44062135 #

But at the same time they don't open weights of Codestral...

18. sagarpatil ◴[26 May 25 03:47 UTC] No.44093810{3}[source]▶

>>44060996 #

Sorry. I was wrong.