I also appreciate that they did not attempt to tackle filesystem encoding here, which is a separate issue that drives me nuts, but separately.
I also appreciate that they did not attempt to tackle filesystem encoding here, which is a separate issue that drives me nuts, but separately.
For example, on Unix/Linux, using iso-8859-1 was common when using Western-European languages, and in Europe it became common to switch to iso-8859-15 after the Euro was introduced, because it contained the € symbol. UTF-8 only began to work flawlessly in the later aughts. Debian switched to it as the default with the Etch release in 2010.
It's also true that essentially nothing on Linux supports the UTF-8 byte order mark. Yes, it's meaningless for UTF-8, but it is explicitly allowed in the specifications. Since Microsoft tends to always include a BOM in any flavor of Unicode, this means Linux often chokes on valid UTF-8 text files from Windows systems.
You need to use the special "utf-8-sig" encoding for that, which is not prominently advertised anywhere in the documentation (but it is stated deep inside the "Unicode HOWTO").
I never understood why ignoring this special character requires a totally separate encoding.
Most obviously shebang (the practice of writing #!/path/to/interpreter at the start of a script) is specifically defined on those first two bytes. It doesn't make any sense have a BOM here because that's not the format, and inventing a new rule later which says you can do it doesn't make that true, any more than in 2024 the German government can decide Germany didn't invade Poland in 1939, that's not how Time's Arrow works.
Because the BOM is indistinguishable from the "real" UTF-8 encoding of U+FEFF (zero-width no-break space). Trimming that codepoint in the UTF-8 decoder means that some strings like "\uFEFF" can't be safely round-tripped; adding it in the encoder is invalid in many contexts.
Depends how wide your definition of "poor quality" is. All powershell files (ps1, psm1, psd1) are assumed to be in the local charset unless they have a byte order mark, in which case they're treated as whatever the BOM says.
This is an example of poor quality software:
> All powershell files (ps1, psm1, psd1) are assumed to be in the local charset unless they have a byte order mark, in which case they're treated as whatever the BOM says.
Powershell is not that old. Assuming local encoding is inexcusable here.