https://pastebin.com/raw/D7p7mRLKMy comment in a pastebin. HN doesn't like unicode.
You need this crate to deal with it in Rust, it's not part of the base libraries:
https://crates.io/crates/unicode-segmentation
The languages that have this kind of feature built-in in the standard library, to my knowledge, are Swift, JavaScript, C# and Java. Swift is the only one, of those four, that treat operating on graphemes as the default. JavaScript requires Intl.Segmenter, C# requires StringInfo, Java requires BreakIterator.
By the way, Python, the language caused so much hurt with their 2.x->3.x transition promising better unicode support in return for this pain couldn't even do this right. There is no concept of graphemes in the standard library. So much for the batteries included bit.
>>> test = " "
>>> [char for char in test]
['', '\u200d', '', '\u200d', '', '\u200d', '']
>>> len(test)
7
In JavaScript REPL (nodejs):
> let test = " "
undefined
> [...new Intl.Segmenter().segment(test)][0].segment;
' '
> [...new Intl.Segmenter().segment(test)].length;
1
Works as it should.
In python you would need a third party library.
Swift is truly the nicest of programming languages as far as strings are concerned. It just works as it always should have been.
let test = " "
for char in test {
print(char)
}
print(test.count)
output :
1
[Execution complete with exit code 0]
I, as a non-Apple user, feel quite the Apple envy whenever I think about swift. It's such a nice language, but there's little ecosystem outside of Apple UIs.
But man, no using third party libraries, or working with a wrapper segmenter class or iterator. Just use the base string literals as is. It. Just. Works.