←back to thread

Go subtleties

(harrisoncramer.me)
234 points darccio | 1 comments | | HN request time: 0.208s | source
Show context
mwsherman ◴[] No.45670197[source]
There is mention of how len() is bytes, not “characters”. A further subtlety: a rune (codepoint) is still not necessarily a “character” in terms of what is displayed for users — that would be a “grapheme”.

A grapheme can be multiple codepoints, with modifiers, joiners, etc.

This is true in all languages, it’s a Unicode thing, not a Go thing. Shameless plug, here is a grapheme tokenizer for Go: https://github.com/clipperhouse/uax29/tree/master/graphemes

replies(2): >>45670657 #>>45675222 #
1. virtualritz ◴[] No.45675222[source]
len() is also returning int instead of uint/uint64 in Go.

I do not use Go but ran into this when I had to write a Go wrapper for some Rust stuff the other day. I was baffled.