I also appreciate that they did not attempt to tackle filesystem encoding here, which is a separate issue that drives me nuts, but separately.
I also appreciate that they did not attempt to tackle filesystem encoding here, which is a separate issue that drives me nuts, but separately.
For example, on Unix/Linux, using iso-8859-1 was common when using Western-European languages, and in Europe it became common to switch to iso-8859-15 after the Euro was introduced, because it contained the € symbol. UTF-8 only began to work flawlessly in the later aughts. Debian switched to it as the default with the Etch release in 2010.
Historically it made sense to be locale-dependent, but even then it was annoying to be platform-dependent.
One is not a subset of the other.
IIRC, the main way this brittleness bit me was that every time a buffer containing a non-ASCII character was saved, Emacs would engage me in a conversation (which I found tedious and distracting) about what coding system I would like to use to save the file, and I never found a sane way to configure it to avoid such conversations even after spending hours learning about how Emacs does coding systems: I simply had to wait (a year or 3) for a new version of Emacs in which the code for saving buffers worked better.
I think some people like engaging in these conversations with their computers even though the conversations are very boring and repetitive and that such conversation-likers are numerous among Emacs users or at least Emacs maintainers.
It's also true that essentially nothing on Linux supports the UTF-8 byte order mark. Yes, it's meaningless for UTF-8, but it is explicitly allowed in the specifications. Since Microsoft tends to always include a BOM in any flavor of Unicode, this means Linux often chokes on valid UTF-8 text files from Windows systems.
Just one of those stupid little things you have to remember from time to time. Although, why does newly written software require a specific line terminator is a valid question.
It's too bad, with a bit more planning and an earlier realization that Unicode cannot in fact fit into 16 bits then Windows might have used UTF-8 internally.
You need to use the special "utf-8-sig" encoding for that, which is not prominently advertised anywhere in the documentation (but it is stated deep inside the "Unicode HOWTO").
I never understood why ignoring this special character requires a totally separate encoding.
Most obviously shebang (the practice of writing #!/path/to/interpreter at the start of a script) is specifically defined on those first two bytes. It doesn't make any sense have a BOM here because that's not the format, and inventing a new rule later which says you can do it doesn't make that true, any more than in 2024 the German government can decide Germany didn't invade Poland in 1939, that's not how Time's Arrow works.
Because the BOM is indistinguishable from the "real" UTF-8 encoding of U+FEFF (zero-width no-break space). Trimming that codepoint in the UTF-8 decoder means that some strings like "\uFEFF" can't be safely round-tripped; adding it in the encoder is invalid in many contexts.
Windows badly dropped the ball here by not providing a simple opt-in way to make all the Ansi functions (TextOutA, etc) use the UTF-8 code page, until many many years later with the manifest file. This should have been a feature introduced in NT4 or Windows 98, not something that's put off until midway through Windows 10's development cycle.
Of course - goes without saying, only works when the directive comes from all the way at the top. Otherwise there will be just too many conflicting incentives for any real change to happen.
While I am on this topic - I want to mention Apple. It is absolutely bonkers how they have done exactly the is countless times. Like changing your entire platform architecture! It could have been like opening a can of worms but they knew what they were doing. Kudos to them.
Also..(sorry, this is becoming a long post) civil and industrial engineering firms routinely pull off projects like that. But the point I wanted to emphasize is that it’s very uncommon in tech which prides on having decentralized and semi-autonomous teams vs centralized and highly aligned teams.
Depends how wide your definition of "poor quality" is. All powershell files (ps1, psm1, psd1) are assumed to be in the local charset unless they have a byte order mark, in which case they're treated as whatever the BOM says.
Apple has a walled garden approach to managing their ecosystem, and within the confines of their garden they just do what's necessary. AFAIK, Apple doesn't care about the possibilty to run binaries from the '90s on a modern stack.
Edit: even though it's expensive, it's possible to conduct such ecosystem-wide changes if you hold all cards in your hand. Microsoft was able to reengineer the graphical subsystem somewhere between XP and 8. Doing something like this is magnitudes more difficult on Linux (Wayland says hi). Google could maybe do it withij their Android corner, but they generally give a sh*t about backwards compatibility.
Recently, i've gotten bit by UTF16 (because somewhere along the line somewhere on a windows machine generated a file by piping it in powershell)
This is an example of poor quality software:
> All powershell files (ps1, psm1, psd1) are assumed to be in the local charset unless they have a byte order mark, in which case they're treated as whatever the BOM says.
Powershell is not that old. Assuming local encoding is inexcusable here.
I don't think the walled garden makes much of a difference when it comes to compatibility on, say, macOS. They still have to carefully weigh the ecosystem-wide cost of deprecating old APIs against the ecosystem-wide long-term benefits. Yes the decision remains entirely their own, but a lot of stakeholders indirectly weigh on the decision.
GTK and Qt also make backwards-incompatible new versions as they evolve. The biggest difference here is that in theory someone could keep maintaining the old library code if they decided that updating their application code was always going to be harder. How rarely this actually happens gives weight to the argument that developers can accept occasional API overhauls in exchange for staying on the well-supported low-tech-debt path.
So walled or open made no difference here, even in the open platform, application developers are largely at the mercy of where development effort on libraries and frameworks is going. Nobody can afford to make their own exclusive frameworks to an acceptable standard, and if we want to get away from the technical debt of the 90s then the shared frameworks have to make breaking changes occasionally and strategically.
> AFAIK, Apple doesn't care about the possibilty to run binaries from the '90s on a modern stack.
Definitely, and I don't either. It's kind of a silver lining that Apple wasn't the enterprise heavy-hitter that Microsoft was at the time, because if it had been, its entire culture and landscape would be shaped by it like Microsoft's was. I think we have enough of that in the industry already.
When an old platform is that old, it's really hard to justify making it a seamless subset of the modern platform, and it makes more sense to talk about some form of virtualization. This is where even Windows falls down on both counts. How well modern Windows runs old software is far more variable than people assume until they try it. Anything with 8-bit colors may not work at all.
VirtualBox, qemu, etc. have increasingly poor support for DOS-based Windows (95, 98, ME) because not enough people care about that even in the context of virtualization. After trying every free virtualization option to run some 90s Windows software, I ended up finding that WINE was more compatible with that era than modern Windows is, without any of the jank of running a real Windows in qemu or VirtualBox.
So even with the OS most famous for backwards-compatibility and the enormous technical debt that carries, compatibility has been slowly sliding, even worse than open source projects with no direct lineage to the same platform and no commercial motives.
It's perfectly justifiable to reset technical debt here, whether walled or open. If people have enough need to run old software, there should be a market of solutions to that problem, yet it generally remains niche or hobbyist, and even the big commercial vendors overestimate how well they're doing it.