←back to thread

397 points Anon84 | 1 comments | | HN request time: 0.323s | source
Show context
barrell ◴[] No.45126116[source]
I recently upgraded a large portion of my pipeline from gpt-4.1-mini to gpt-5-mini. The performance was horrible - after some research I decided to move everything to mistral-medium-0525.

Same price, but dramatically better results, way more reliable, and 10x faster. The only downside is when it does fail, it seems to fail much harder. Where gpt-5-mini would disregard the formatting in the prompt 70% of the time, mistral-medium follows it 99% of the time, but the other 1% of the time inserts random characters (for whatever reason, normally backticks... which then causes it's own formatting issues).

Still, very happy with Mistral so far!

replies(11): >>45126199 #>>45126266 #>>45126479 #>>45126528 #>>45126707 #>>45126741 #>>45126840 #>>45127790 #>>45129028 #>>45130298 #>>45136002 #
mark_l_watson ◴[] No.45126266[source]
It is such a common pattern for LLMs to surround generated JSON with ```json … ``` that I check for this at the application level and fix it. Ten years ago I would do the same sort of sanity checks on formatting when I used LSTMs to generate synthetic data.
replies(9): >>45126463 #>>45126482 #>>45126489 #>>45126578 #>>45127374 #>>45127884 #>>45127900 #>>45128015 #>>45128042 #
mpartel ◴[] No.45128042[source]
Some LLM APIs let you give a schema or regex for the answer. I think it works because LLMs give a probability for every possible next token, and you can filter that list by what the schema/regex allows next.
replies(1): >>45128098 #
hansvm ◴[] No.45128098[source]
Interestingly, that gives a different response distribution from simply regenerating while the output doesn't match the schema.
replies(2): >>45130238 #>>45131168 #
joshred ◴[] No.45130238[source]
It sounds like they are describing a regex filter being applied to the model's beam search. LLMs generate the most probable words, but they are frequently tracking several candidate phrases at a time and revising their combined probability. It lets them self correct if a high probability word leads to a low probability phrase.

I think they are saying that if highest probability phrase fails the regex, the LLM is able to substitute the next most likely candidate.

replies(1): >>45132697 #
1. stavros ◴[] No.45132697[source]
You're actually applying a grammar to the token. If you're outputting, for example, JSON, you know what characters are valid next (because of the grammar), so you just filter out the tokens that don't fit the grammar.