Cannot believe it's efficient. XML is the most verbose and inefficient of communicating anything. The only benefit of XML was to give lifetime work to an army of engineers. The next news will be "Why DTD is so fundamental to Claude".
The point isn't to be efficient. If you train an LLM on code with an example execution trace written in the comments, the LLM gains a better understanding due to the additional context in the data. LLMs don't have a real world model. For them, the token space is the real world. All the information needs to be present in the training data and XML makes it easy because it is verbose and explicit about everything.
When you're tokenizing it does not matter really what you use (how you translate that token to-from a text string), the main thing is the overall number of tokens. XML is particularly amenable to tokenization because it is trivial to represent entire tags as a single token (or a pair of tokens, one for the open tag, one for the close).
It gets a bit muddier with attributes, but you can still capture the core semantics of the tag with a single token. The model will learn that tag's attributes through training on usages of the tag.