Being able to control how many tokens are spent on thinking is a game-changer. I've been building fairly complex, efficient, systems with many LLMs. Despite the advantages, reasoning models have been a no-go due to how variable the cost is, and how hard that makes it to calculate a final per-query cost for the customer. Being able to say "I know this model can always solve this problem in this many thinking tokens" and thus limiting the cost for that component is huge.
replies(1):