←back to thread

238 points GalaxySnail | 1 comments | | HN request time: 0.219s | source
Show context
a-french-anon ◴[] No.40170353[source]
Why not utf-8-sig, though? It handles optional BOMs. Had to fix a script last week that choked on it.
replies(3): >>40170707 #>>40170832 #>>40171048 #
orf ◴[] No.40170832[source]
Because changing Python to silently prefixing all IO with an invisible BOM isn’t a good idea.
replies(1): >>40174582 #
int_19h ◴[] No.40174582[source]
The expectation isn't for it to generate BOM in the output, but to handle BOM gracefully when it occurs in the input.
replies(2): >>40176709 #>>40176715 #
shpx ◴[] No.40176715[source]
> On encoding the utf-8-sig codec will write 0xef, 0xbb, 0xbf as the first three bytes to the file

https://docs.python.org/3/library/codecs.html

The codec you're imagining would also make reading a file and writing it back change the file if it contains a BOM.

replies(1): >>40178806 #
1. int_19h ◴[] No.40178806[source]
Indeed it would, but since codecs are only used for files that are semantically text, and in such files BOM is basically a legacy no-op marker, it's not actually a problem. Naive code using text I/O APIs would also have this issue with line endings, for example, so there's precedent for not providing the perfect roundtrip experience (that's what bytes I/O is for).