(mistral.ai)

701 points mfiguiere | 3 comments | 21 May 25 14:21 UTC | HN request time: 0.669s | source

Show context

oofbaroomf ◴[21 May 25 18:19 UTC] No.44054477[source]▶

The SWE-Bench scores are very, very high for an open source model of this size. 46.8% is better than o3-mini (with Agentless-lite) and Claude 3.6 (with AutoCodeRover), but it is a little lower than Claude 3.6 with Anthropic's proprietary scaffold. And considering you can run this for almost free, this is a very extraordinary model.

replies(3): >>44056216 #>>44056570 #>>44058287 #

1. sagarpatil ◴[22 May 25 02:44 UTC] No.44058287[source]▶

>>44054477 #

They are referring to SWE bench lite. Just want to make sure you are too.

replies(1): >>44060996 #

2. svantana ◴[22 May 25 11:32 UTC] No.44060996[source]▶

>>44058287 (TP) #

Where did you get that idea? In the post they are repeatedly referring to SWEBench-Verified and nothing else.

replies(1): >>44093810 #

3. sagarpatil ◴[26 May 25 03:47 UTC] No.44093810[source]▶

>>44060996 #

Sorry. I was wrong.

↑

Devstral