PEP 686 – Make UTF-8 mode default

Depends on what you define as "file I/O", though. NTFS filenames are UTF-16 (or rather UCS2). As far as file contents, there isn't really a standard, but FWIW for a long time most Windows apps - Notepad being the canonical example when asked to save anything as "Unicode" would save it as UTF-16.

replies(1): >>40174958 #

4. layer8 ◴[26 Apr 24 22:21 UTC] No.40174958{3}[source]▶

>>40174570 #

I'm talking about the default behavior of Microsoft's C runtime (MSVCRT.DLL) that everyone is/was using.

UTF-16 text files are rather rare, as is using Notepad's UTF-16 options. The only semi-common use I know of is *.reg files saved from regedit. One issue with UTF-16 is that it has two different serializations (BE and LE), and hence generally requires a BOM to disambiguate.

replies(2): >>40176676 #>>40178858 #

5. TheCycoONE ◴[27 Apr 24 02:43 UTC] No.40176676{4}[source]▶

>>40174958 #

Powershell use to output utf-16 by default on Windows. It might still but it's been awhile since I needed to try.

6. int_19h ◴[27 Apr 24 10:27 UTC] No.40178858{4}[source]▶

>>40174958 #

Then you're talking about the C stdlib, which, yeah, is meant to use the locale-specific encoding on any platform, so it's not really a Windows thing specifically. But even then someone could use the CRT but call wfopen() rather than fopen() etc - this was actually not uncommon for Windows software precisely because it let you handle Unicode without having to work with Win32 API directly.

Microsoft's implementation of fopen() also supports "ccs=..." to open Unicode text files in Unicode, and interestingly "ccs=UNICODE" will get you UTF-16LE, not UTF-8 (but you can do "ccs=UTF-8"). .NET also has this weird naming quirk where Encoding.Unicode is UTF-16, although there at least UTF-8 is the default for all text I/O classes like StreamReader if you don't specify the encoding. Still, many people didn't know better, and so some early .NET software would use UTF-16 for text I/O for no reason other than its developers believing that Encoding.Unicode is obviously what they are supposed to be using to "support Unicode", and so explicitly passing it everywhere.

↑