An attempt to store the encoding needed to decode the data with the data, rather than requiring the reader to know it somehow. Your program wouldn't have to care if its source data had been encoded as UTF-8, UTF-16, UTF-32 or some future standard. The usual sort of compromise that comes out of committees, in this case where every committee member wanted to be able to spit their preferred in-memory Unicode string representation to disk with no encoding overhead.