←back to thread

238 points GalaxySnail | 1 comments | | HN request time: 0.256s | source
Show context
a-french-anon ◴[] No.40170353[source]
Why not utf-8-sig, though? It handles optional BOMs. Had to fix a script last week that choked on it.
replies(3): >>40170707 #>>40170832 #>>40171048 #
shellac ◴[] No.40171048[source]
At this point nothing ought to be inserting BOMs in utf-8. It's not recommended, and I think choking on it is reasonable behaviour these days.
replies(3): >>40171192 #>>40173969 #>>40178398 #
Athas ◴[] No.40171192[source]
Why were BOMs ever allowed for UTF-8?
replies(5): >>40171419 #>>40171452 #>>40172241 #>>40175549 #>>40177110 #
1. stubish ◴[] No.40177110[source]
An attempt to store the encoding needed to decode the data with the data, rather than requiring the reader to know it somehow. Your program wouldn't have to care if its source data had been encoded as UTF-8, UTF-16, UTF-32 or some future standard. The usual sort of compromise that comes out of committees, in this case where every committee member wanted to be able to spit their preferred in-memory Unicode string representation to disk with no encoding overhead.