(peps.python.org)

238 points GalaxySnail | 1 comments | 26 Apr 24 11:55 UTC | HN request time: 0.219s | source

Show context

a-french-anon ◴[26 Apr 24 15:08 UTC] No.40170353[source]▶

>>40168242 (OP) #

Why not utf-8-sig, though? It handles optional BOMs. Had to fix a script last week that choked on it.

replies(3): >>40170707 #>>40170832 #>>40171048 #

orf ◴[26 Apr 24 15:42 UTC] No.40170832[source]▶

>>40170353 #

Because changing Python to silently prefixing all IO with an invisible BOM isn’t a good idea.

replies(1): >>40174582 #

int_19h ◴[26 Apr 24 21:36 UTC] No.40174582[source]▶

>>40170832 #

The expectation isn't for it to generate BOM in the output, but to handle BOM gracefully when it occurs in the input.

replies(2): >>40176709 #>>40176715 #

shpx ◴[27 Apr 24 02:50 UTC] No.40176715[source]▶

>>40174582 #

> On encoding the utf-8-sig codec will write 0xef, 0xbb, 0xbf as the first three bytes to the file

https://docs.python.org/3/library/codecs.html

The codec you're imagining would also make reading a file and writing it back change the file if it contains a BOM.

replies(1): >>40178806 #

1. int_19h ◴[27 Apr 24 10:14 UTC] No.40178806[source]▶

>>40176715 #

Indeed it would, but since codecs are only used for files that are semantically text, and in such files BOM is basically a legacy no-op marker, it's not actually a problem. Naive code using text I/O APIs would also have this issue with line endings, for example, so there's precedent for not providing the perfect roundtrip experience (that's what bytes I/O is for).

↑

PEP 686 – Make UTF-8 mode default