←back to thread

397 points Anon84 | 1 comments | | HN request time: 0.202s | source
Show context
barrell ◴[] No.45126116[source]
I recently upgraded a large portion of my pipeline from gpt-4.1-mini to gpt-5-mini. The performance was horrible - after some research I decided to move everything to mistral-medium-0525.

Same price, but dramatically better results, way more reliable, and 10x faster. The only downside is when it does fail, it seems to fail much harder. Where gpt-5-mini would disregard the formatting in the prompt 70% of the time, mistral-medium follows it 99% of the time, but the other 1% of the time inserts random characters (for whatever reason, normally backticks... which then causes it's own formatting issues).

Still, very happy with Mistral so far!

replies(11): >>45126199 #>>45126266 #>>45126479 #>>45126528 #>>45126707 #>>45126741 #>>45126840 #>>45127790 #>>45129028 #>>45130298 #>>45136002 #
WhitneyLand ◴[] No.45130298[source]
Were you using structured output with gpt-5 mini?

Is there an example you can show that tended to fail?

I’m curious how token constraint could have strayed so far from your desired format.

replies(1): >>45131319 #
1. barrell ◴[] No.45131319[source]
Here is an example of the formatting I desired: https://x.com/barrelltech/status/1963684443006066772?s=46&t=...

Yes I use(d) structured output. I gave it very specific instructions and data for every paragraph, and asked it to generate paragraphs for each one using this specific format. For the formatting, I have a large portion of the system prompt detailing it exactly, with dozens of examples.

gpt-5-mini would normally use this formatting maybe once, and then just kinda do whatever it wanted for the rest of the time. It also would freestyle and put all sorts of things in the various bold and italic sections (using the language name instead of the translation was one of its favorites) that I’ve never seen mistral do in the thousands of paragraphs I’ve read. It also would fail in some other truly spectacular ways, but to go into all of them would just be bashing on gpt-5-mini.

Switched it over to mistral, and with a bit of tweaking, it’s nearly perfect (as perfect as I would expect from an LLM, which is only really 90% sufficient XD)