Hmm, that's an interesting way of thinking about it. The way I see it, I trust XML less, because the sparser representation gives it more room to make a mistake: if you think of every token as an opportunity to be correct or wrong, the higher token count needed to represent content in XML gives the model a higher chance to get the output wrong (kinda like the birthday paradox).
(Plus, more output tokens is more expensive!)
e.g.
using the cl_100k tokenizer (what GPT4 uses), this JSON is 60 tokens:
{
"method": "GET",
"endpoint": "/api/model/details",
"headers": {
"Authorization": "Bearer YOUR_ACCESS_TOKEN",
"Content-Type": "application/json"
},
"queryParams": {
"model_id": "12345"
}
}
whereas this XML is 76 tokens:
<?xml version="1.0" encoding="UTF-8" ?>
<method>GET</method>
<endpoint>/api/model/details</endpoint>
<headers>
<Authorization>Bearer YOUR_ACCESS_TOKEN</Authorization>
<Content-Type>application/json</Content-Type>
</headers>
<queryParams>
<model_id>12345</model_id>
</queryParams>
You can check out the tokenization here by toggling "show tokens":
https://www.promptfiddle.com/json-vs-xml-token-count-BtXe3