Most active commenters

    ←back to thread

    397 points Anon84 | 20 comments | | HN request time: 0.499s | source | bottom
    Show context
    barrell ◴[] No.45126116[source]
    I recently upgraded a large portion of my pipeline from gpt-4.1-mini to gpt-5-mini. The performance was horrible - after some research I decided to move everything to mistral-medium-0525.

    Same price, but dramatically better results, way more reliable, and 10x faster. The only downside is when it does fail, it seems to fail much harder. Where gpt-5-mini would disregard the formatting in the prompt 70% of the time, mistral-medium follows it 99% of the time, but the other 1% of the time inserts random characters (for whatever reason, normally backticks... which then causes it's own formatting issues).

    Still, very happy with Mistral so far!

    replies(11): >>45126199 #>>45126266 #>>45126479 #>>45126528 #>>45126707 #>>45126741 #>>45126840 #>>45127790 #>>45129028 #>>45130298 #>>45136002 #
    1. mark_l_watson ◴[] No.45126266[source]
    It is such a common pattern for LLMs to surround generated JSON with ```json … ``` that I check for this at the application level and fix it. Ten years ago I would do the same sort of sanity checks on formatting when I used LSTMs to generate synthetic data.
    replies(9): >>45126463 #>>45126482 #>>45126489 #>>45126578 #>>45127374 #>>45127884 #>>45127900 #>>45128015 #>>45128042 #
    2. Alifatisk ◴[] No.45126463[source]
    I think this is the first time I stumped upon someone who actually mentions LSTM in a practical way instead of just theory. Cool!

    Would you like to elaborate further on how the experience was with it? What was your approach for using it? How did you generate synthetic data? How did it perform?

    replies(1): >>45127590 #
    3. barrell ◴[] No.45126482[source]
    Yeah, that’s infuriating. They’re getting better now with structured data, but it’s going to be a never ending battle getting reliable data structures from an LLM.

    This is maybe more maybe less insidious. It will literally just insert a random character into the middle of a word.

    I work with an app that supports 120+ languages though. I give the LLM translations, transliterations, grammar features etc and ask it to explain it in plain English. So it’s constantly switching between multiple real, and sometimes fake (transliterations) languages. I don’t think most users would experience this

    4. Alifatisk ◴[] No.45126489[source]
    I do use backticks a lot when sharing examples in different format when using LLMs and I have instructed them to do likewise, I also upvote whenever they respond in that matter.

    I got this format from writing markdown files, it’s a nice way to share examples and also specify which format it is.

    5. viridian ◴[] No.45126578[source]
    I'm sure the reason is the plethora of markdown data is was trained on. I personally use ``` stuff.txt ``` extremely frequently, in a variety of places.

    In slack/teams I do it with anything someone might copy and paste to ensure that the chat client doesn't do something horrendous like replace my ascii double quotes with the fancy unicode ones that cause syntax errors.

    In readme files any example path, code, yaml, or json is wrapped in code quotes.

    In my personal (text file) notes I also use ``` {} ``` to denote a code block I'd like to remember, just out of habit from the other two above.

    replies(1): >>45127290 #
    6. accrual ◴[] No.45127290[source]
    Same. For me it's almost like a symbiotic thing to me. After using LLMs for a couple of years I noticed I use code blocks/backticks a lot more often. It's helpful for me as an inline signal like "this is a function name or hostname or special keyword" but it's also helpful for other people/Teams/Slack and LLMs alike.
    replies(1): >>45128349 #
    7. mejutoco ◴[] No.45127374[source]
    Funny, I do the same. Additionally, one can define a json schema for the output and try to load the response as json or retry for a number of times. If it is not valid json or the schema is not followed we discard it and retry.

    It also helps with having a field of the json be the confidence or a similar pattern to act as a cut for what response is accepted.

    8. p1esk ◴[] No.45127590[source]
    10 years ago I used LSTMs for music generation. Worked pretty well for short MIDI snippets (30-60 seconds).
    9. fumeux_fume ◴[] No.45127884[source]
    Very common struggle, but a great way to prevent that is prefilling the assistant response with "{" or as much JSON output as you're going to know ahead of time like '{"response": ['
    replies(2): >>45128284 #>>45128591 #
    10. freehorse ◴[] No.45127900[source]
    I had similar issues with local models, ended up actually requesting the backticks because it was easier this way, and parsed the output accordingly. I cached a prompt with explicit examples how to structure data, and reused this over and over. I have found that without examples in the prompts some llms are very unreliable, but with caching some example prompts this becomes a non-issue.
    11. tosh ◴[] No.45128015[source]
    I think most mainstream APIs by now have a way for you to conform the generated answer to a schema.
    12. mpartel ◴[] No.45128042[source]
    Some LLM APIs let you give a schema or regex for the answer. I think it works because LLMs give a probability for every possible next token, and you can filter that list by what the schema/regex allows next.
    replies(1): >>45128098 #
    13. hansvm ◴[] No.45128098[source]
    Interestingly, that gives a different response distribution from simply regenerating while the output doesn't match the schema.
    replies(2): >>45130238 #>>45131168 #
    14. psadri ◴[] No.45128284[source]
    Haven’t tried this. Does it mix well with tool calls? Or does it force a response where you might have expected a tool call?
    replies(1): >>45129068 #
    15. OJFord ◴[] No.45128349{3}[source]
    I'm the opposite, always been pretty good about doing that in Slack etc. (or even here where it doesn't affect the rendering) but I sometimes don't bother in LLM chat.
    16. XenophileJKO ◴[] No.45128591[source]
    Just to be clear for anyone reading this, the optimal way to do this is schema enforced inference. You can only get a parsable response. There are failure modes, but you don't have to mess with parsing at all.
    17. fumeux_fume ◴[] No.45129068{3}[source]
    It'll force a response that begins with an open bracket. So if you might need a response with a tool call that doesn't start with "{", then it might not fit your workflow.
    18. joshred ◴[] No.45130238{3}[source]
    It sounds like they are describing a regex filter being applied to the model's beam search. LLMs generate the most probable words, but they are frequently tracking several candidate phrases at a time and revising their combined probability. It lets them self correct if a high probability word leads to a low probability phrase.

    I think they are saying that if highest probability phrase fails the regex, the LLM is able to substitute the next most likely candidate.

    replies(1): >>45132697 #
    19. Rudybega ◴[] No.45131168{3}[source]
    This is true, but there are methods to greatly reduce the effect of this and generate results that match or even improve overall output accuracy:

    e.g. DOMINO https://arxiv.org/html/2403.06988v1

    20. stavros ◴[] No.45132697{4}[source]
    You're actually applying a grammar to the token. If you're outputting, for example, JSON, you know what characters are valid next (because of the grammar), so you just filter out the tokens that don't fit the grammar.