←back to thread

178 points dgl | 1 comments | | HN request time: 0.397s | source
Show context
duped ◴[] No.44363404[source]
In my fever dreams of maintaining utf8 supporting text widgets that work and never need to be updated, there's a zero-width whitespace grapheme cluster that represents the number of codepoints in the next grapheme cluster if they're different from the previous.

The situation today is basically the same as null terminated C strings. Except worse, because you can define that problem and solve it in linear time/space without needing to keep an up to date list of tables.

replies(3): >>44363439 #>>44363883 #>>44366323 #
1. kps ◴[] No.44366323[source]
Combining characters and joiners should have been prefix rather than suffix/infix operators (and preferably in blocks by arity) so you'd always know without lookahead whether a grapheme cluster was complete.

(Prefix combining accents would also have made dead keys trivial rather than painful.)