←back to thread

Devstral

(mistral.ai)
701 points mfiguiere | 9 comments | | HN request time: 0s | source | bottom
Show context
oofbaroomf ◴[] No.44054477[source]
The SWE-Bench scores are very, very high for an open source model of this size. 46.8% is better than o3-mini (with Agentless-lite) and Claude 3.6 (with AutoCodeRover), but it is a little lower than Claude 3.6 with Anthropic's proprietary scaffold. And considering you can run this for almost free, this is a very extraordinary model.
replies(3): >>44056216 #>>44056570 #>>44058287 #
falcor84 ◴[] No.44056216[source]
Just to confirm, are you referring to Claude 3.7?
replies(1): >>44056250 #
1. oofbaroomf ◴[] No.44056250[source]
No. I am referring to Claude 3.5 Sonnet New, released October 22, 2024, with model ID claude-3-5-sonnet-20241022, colloquially referred to as Claude 3.6 Sonnet because of Anthropic's confusing naming.
replies(4): >>44056271 #>>44056382 #>>44056760 #>>44061050 #
2. SkyPuncher ◴[] No.44056271[source]
> colloquially referred to as Claude 3.6

Interesting. I've never heard this.

replies(2): >>44057177 #>>44060064 #
3. Deathmax ◴[] No.44056382[source]
Also known as Claude 3.5 Sonnet V2 on AWS Bedrock and GCP Vertex AI
4. ttoinou ◴[] No.44056760[source]
And it is a very good LLM. Some people complain they don't see an improvement with Sonnet 3.7
5. simonw ◴[] No.44057177[source]
It's the reason Anthropic called their next release 3.7 Sonnet - the 3.6 version number was already being used by some in the community to refer to their 3.5v2.
6. turing_complete ◴[] No.44060064[source]
because nobody says that
replies(2): >>44061275 #>>44062804 #
7. moffkalast ◴[] No.44061050[source]
The model formerly known as Claude 3.6 Sonnet?
8. NiloCK ◴[] No.44061275{3}[source]
Anthropic moved from 3.5, to 3.5(new), to 3.7. They skipped 3.6 because of usage in the community, and because 3.5(newer) probably passed some threshold of awfulness.

People also use 3.5.1 to refer to 3.5(new)/3.6.

The remaining difficulty now is when people refer to 3.5, without specifying (new) or (old). I find most unspecified references to 3.5 these days are actually to 3.6 / 3.5.1 / 3.5(new), which is confusing.

9. skerit ◴[] No.44062804{3}[source]
That's not correct. I have always referred to it as v3.6, and I've seen plenty of other people do so too. It's why their next model was called v3.7