←back to thread

397 points Anon84 | 4 comments | | HN request time: 0.856s | source
Show context
barrell ◴[] No.45126116[source]
I recently upgraded a large portion of my pipeline from gpt-4.1-mini to gpt-5-mini. The performance was horrible - after some research I decided to move everything to mistral-medium-0525.

Same price, but dramatically better results, way more reliable, and 10x faster. The only downside is when it does fail, it seems to fail much harder. Where gpt-5-mini would disregard the formatting in the prompt 70% of the time, mistral-medium follows it 99% of the time, but the other 1% of the time inserts random characters (for whatever reason, normally backticks... which then causes it's own formatting issues).

Still, very happy with Mistral so far!

replies(11): >>45126199 #>>45126266 #>>45126479 #>>45126528 #>>45126707 #>>45126741 #>>45126840 #>>45127790 #>>45129028 #>>45130298 #>>45136002 #
1. viridian ◴[] No.45126528[source]
I'm curious what your prompts look like, as this is the opposite of my experience. I use lmarena for many of the random one shot questions I have, and I've noticed that mistral-medium is almost always the worse of the two after I blind vote. Feels like it consistently takes losses from qwen, llama, gemini, gpt, you name it. I find it overwhelmingly the most likely to produce factually untrue information to an inquiry.

Would you be willing to share an example prompt? I'm curious to see what it'sesponding well to.

replies(1): >>45127085 #
2. barrell ◴[] No.45127085[source]
I provide it with data and ask it to convert it to prose in specific formats.

Mistral medium is ranked #8 on lmsys arena IIRC, so it’s probably just not your style?

I’m also comparing this to gpt-5-mini, not the big boy

replies(1): >>45130837 #
3. viridian ◴[] No.45130837[source]
I think input strategy probably accounts for the difference. Usually I'm just asking a short question with no additional context, and usually it's not the sort of thing that has one well defined answer. I'm really asking it to summarize the wisdom of the crowd, so to speak.

For example, I ask, what are the most common targets of removal in magic: the gathering? Mistral's answer is so-so, including a slew of cards you would prioritize removing, but also several you typically wouldn't, including things like mox amber, a 0 cost mana rock. Gemini flash gave far fewer examples, one for each major card type type, but all of them are definitely priority targets that often defined an entire metagame, like Tarmogoyf.

replies(1): >>45131363 #
4. barrell ◴[] No.45131363{3}[source]
Ah yeah. I’m only grading it on its prose, formatting, ability to interpret data, and instruction following. I do not use it as a store of knowledge