You either get the same (in this case wrong) thing differently worded, or worse you get effectively noise if the second probability is very much lower than the largest probability.
My guess is that applies here too. Better to let all the layers rethink the tokens, than force hallucination of eg a random letter when you don't expect an angle bracket
(Edit: above is assuming using logprobs and/or logit_bias with the OpenAI API, not some other masking technique)
If you’re driving user-facing interactions with LLMs, though, and you’re already dealing with >1min latency on the first call (as many of our current users are!), waiting for another LLM call to come back is a really frustrating thing to block your UX on.
You're right, though, that reprompting works with pretty much everything out there, including hosted models that don't have tool use as part of their API. And its simple too, you don't even need to know what "token masking" is.
Reprompting can also apply arbitrarily criteria that are more complex than just a json schema. You ask it to choose an excerpt of a document and the string it returns isn't an excerpt? Just reprompt.