Mercury: Ultra-fast language models based on diffusion

For something a little different than a coding task, I tried using it in my game: https://www.playintra.win/ (in settings you can select Mercury, the game uses OpenRouter)

At first it seemed pretty competent and of course very fast, but it seemed to really fall apart as the context got longer. The context in this case is a sequence of events and locations, and it needs to understand how those events are ordered and therefore what the current situation and environment are (though there's also lots of hints in the prompts to keep it focused on the present moment). It's challenging, but lots of smaller models can pull it off.

But also a first release and a new architecture. Maybe it just needs more time to bake (GPT 3.5 couldn't do these things either). Though I also imagine it might just perform _differently_ from other LLMs, not really on the same spectrum of performance, and requiring different prompting.