←back to thread

178 points dgl | 2 comments | | HN request time: 0.603s | source
Show context
b0a04gl ◴[] No.44362767[source]
emoji width bugs mostly come down to how terminals interpret Unicode's "grapheme clusters" vs "codepoints" vs "display cells". emoji isn't one codepoint - it's often multiple joined by zero-width joiners, variation selectors, skin tone modifiers. so the terminal asks wcwidth(), gets 1 or 2, but the actual glyph might render wider or combine into a single shape.

some emoji even change width depending on font. family emoji is like 7 codepoints, shows up as one glyph. most terminals don't track that. they just count codepoints and pray.

unless terminal is using a grapheme-aware renderer and syncs with the font's shaping engine (like freetype or coretext), it'll always guess wrong. wezterm and kitty kinda parse it right often

replies(4): >>44362822 #>>44363348 #>>44363640 #>>44363828 #
crackalamoo ◴[] No.44362822[source]
Yeah, unfortunately I feel like despite all the advances in Unicode tech, my modern terminal (MacOS) still bugs out badly with emojis and certain special characters.

I'm not sure how/when codepoints matter for wcwidth: my terminal handles many characters with more than one codepoint in UTF-8, like é and even Arabic characters, just fine.

replies(1): >>44363380 #
o11c ◴[] No.44363380[source]
`wcwidth` works by assigning all codepoints (strictly, code units of whatever size `wchar_t` is on your system, but thankfully modern Unixen are sane) a width of -1 (error), 0 (combining), 1 (narrow), or 2 (wide).

`wcswidth` could in theory work across multiple codepoints, but its API is braindead and cannot deal with partial errors.

This is all from the perspective of what the application expects to output. What the terminal itself does might be something completely different - decomposed Hangul in particular tends to lead to rendering glitches in curses-based terminal programs.

This is also different from what the (monospace) font expects to be rendered as. At least it has the excuse of not being able to call the system's `wcwidth`.

Note that it is always a mistake to call an implementation of `wcwidth` other than the one provided by the OS, since that introduces additional mismatches, unless you are using a better API that calculates bounds rather than an exact width. I posted an oversimplified sketch (e.g. it doesn't include versioning) of that algorithm a while back ...

https://news.ycombinator.com/item?id=43851532

replies(1): >>44363629 #
1. PhilipRoman ◴[] No.44363629[source]
As fallback, you can also just emit the character and see how far the cursor advanced via CSI 6n (try printf '\x1b[6n')
replies(1): >>44363692 #
2. o11c ◴[] No.44363692[source]
Doing that adds a lot of round trips, so you still really need to do the initial estimate.

(also, probing for whether the terminal actually supports various features is nontrivial. At startup you can send the basic "identify the terminal" sequences (there are 2) and check the result with a timeout; subsequently you can make a request then follow it with the basic terminal id to see if you actually get what you requested. But remember you can get arbitrary normal input interspersed.)