←back to thread

178 points dgl | 1 comments | | HN request time: 0.234s | source
Show context
b0a04gl ◴[] No.44362767[source]
emoji width bugs mostly come down to how terminals interpret Unicode's "grapheme clusters" vs "codepoints" vs "display cells". emoji isn't one codepoint - it's often multiple joined by zero-width joiners, variation selectors, skin tone modifiers. so the terminal asks wcwidth(), gets 1 or 2, but the actual glyph might render wider or combine into a single shape.

some emoji even change width depending on font. family emoji is like 7 codepoints, shows up as one glyph. most terminals don't track that. they just count codepoints and pray.

unless terminal is using a grapheme-aware renderer and syncs with the font's shaping engine (like freetype or coretext), it'll always guess wrong. wezterm and kitty kinda parse it right often

replies(4): >>44362822 #>>44363348 #>>44363640 #>>44363828 #
1. Joker_vD ◴[] No.44363640[source]
The main problem is not even if the terminal itself can track the grapheme width "correctly". It's a) the fonts suck; b) does the terminal user tracks the width correctly?

About a): some fonts have the glyphs for e.g. the playing cards block that are 1.5 columns wide even though the code points themselves are defined to be Narrow. How do you render that properly? Then there are variation selectors: despite what some may think, they don't affect the East Asian Width of the preceding code point, so whether you print "\N{ALEMBIC}\N{VARIATION SELECTOR-15}" or "\N{ALEMBIC}\N{VARIATION SELECTOR-16}", it still, according to wcwidth(), takes 1 column; but fonts have glyphs that are, again, 1.5 and 2 cells wide.

And then there is the elephant in the room problem b) which is management of cursor position. But the terminal, and the app that uses the terminal need to have exactly the same idea of where the cursor is, or e.g. readline can't reliably function, or colorful grep output. You need to know how many lines of text you've output (to be able to erase them properly), and whether the cursor is at the leftmost column (because of \b semantics) or at the rightmost column (because xenl is a thing) or neither. And no, requesting the cursor position report from the terminal doesn't really work, it's way too slow and it's interspersed with the user input.

The TUI paradigm really breaks down completely the moment the client is unsure how its output affects the cursor movement in the terminal. And terminals don't help much either! Turning off autowrap is mostly useless (the excess output is not discared, it overwrites the rightmost column instead), the autobackwrap (to make \b go to the previous line from the leftmost column) is almost unsupported and has its own issues, there is no simple command/escape sequence to go to the rightmost column... Oh, and there is xenl behaviour, which has many different subtle variations, and which original VT100 didn't even properly have despite what terminfo manual page may tell you — you can try it with the terminal emulator mentioned in TFA for yourself: go to setup, press 4, 5, move with right arrow to the block 3 and turn the second bit in it on by pressing 6 so it looks like "3 0100", exit setup (what you did is put the temrinal into the local mode so you can input text to it from your keyboard and turned the autowrap on), then do ESC, print "[1;79Hab", do LINE-FEED, print "cd" — you'll see that there is an empty line which shouldn't really be there, and it is not there if you do e.g. printf "\033[1;1Hxx\033[1;79Hab\ncd" on xterm (ironic, given how xterm's maintainer prides themself on being very faithful to original VT100 behaviour) or any other modern terminal.