←back to thread

178 points dgl | 3 comments | | HN request time: 0.502s | source
Show context
b0a04gl ◴[] No.44362767[source]
emoji width bugs mostly come down to how terminals interpret Unicode's "grapheme clusters" vs "codepoints" vs "display cells". emoji isn't one codepoint - it's often multiple joined by zero-width joiners, variation selectors, skin tone modifiers. so the terminal asks wcwidth(), gets 1 or 2, but the actual glyph might render wider or combine into a single shape.

some emoji even change width depending on font. family emoji is like 7 codepoints, shows up as one glyph. most terminals don't track that. they just count codepoints and pray.

unless terminal is using a grapheme-aware renderer and syncs with the font's shaping engine (like freetype or coretext), it'll always guess wrong. wezterm and kitty kinda parse it right often

replies(4): >>44362822 #>>44363348 #>>44363640 #>>44363828 #
duped ◴[] No.44363348[source]
Why do you need to sync with the shaping engine?

TBH grapheme clusters are annoying but day 1 learning material for a text display widget that supports beyond ascii. It honestly irks me how many things just fuck it up, because it's not an intractably hard problem - just annoying enough to be intractable for people that are lazy (*).

(*) the actually hard problem with grapheme clusters is that they're potentially unbounded in length and the standard is mutable, so your wcwidth() implementation needs to be updated along with standards to stay valid, particularly with emoji. This basically creates a software maintenance burden out of aether.

replies(2): >>44363609 #>>44364807 #
zarzavat ◴[] No.44364807[source]
> Why do you need to sync with the shaping engine?

GP explained already. Grapheme clusters ≠ glyphs. To find the number of glyphs you need the font.

An emoji can render as one or two or three or more glyphs depending on what font the user has installed, because many emoji are formed by joining two or more emoji by a ZWJ)

(Also even in a monospace font not all glyphs are of ﷽ equal width)

replies(3): >>44365038 #>>44366271 #>>44366577 #
1. layer8 ◴[] No.44365038[source]
It's not the font that is deciding how emoji sequences are rendered. The renderer may decide based on which characters exist in the available fonts, but it doesn't have to. Same for glyph width in terminals. It wasn’t uncommon for non-double-width-aware terminals to only draw half an emoji in a regular-width cell.
replies(2): >>44365376 #>>44365586 #
2. zarzavat ◴[] No.44365376[source]
How else are you going to render a sequence such as Emoji ZWJ Emoji other than as two glyphs, if no composed glyph is defined in the user's font? That's how it's supposed to be rendered, for backwards compatibility.
3. kccqzy ◴[] No.44365586[source]
> It wasn’t uncommon for non-double-width-aware terminals to only draw half an emoji in a regular-width cell.

And you are just describing bugs. This is not just an emoji issue: it will also fail to render CJK characters.