This is a surprisingly good idea. The model vs model is fun, but not really that useful.
But this could be a legitimate way to design apps in general if you could tell the models what you liked and didn't like.
replies(1):
/vote: Your prompt will be answered by four random, anonymous models. You pick the one you prefer and crown the winner, tournament-style.
/leaderboard: See the current winning models, as dictated by voter preferences.
/play: Iterate quickly by seeing four models respond to the same input and pressing space to regenerate the results you don’t lock-in.
We were especially impressed with the quality of DeepSeek and Grok, and variance between categories (To judge by the results so far, OpenAI is very good for game dev, but seems to suck everywhere else).
We’ve learned a lot, and are curious to hear your comments and questions. Excited to make this better!