←back to thread

Go subtleties

(harrisoncramer.me)
234 points darccio | 2 comments | | HN request time: 0.444s | source
Show context
mwsherman ◴[] No.45670197[source]
There is mention of how len() is bytes, not “characters”. A further subtlety: a rune (codepoint) is still not necessarily a “character” in terms of what is displayed for users — that would be a “grapheme”.

A grapheme can be multiple codepoints, with modifiers, joiners, etc.

This is true in all languages, it’s a Unicode thing, not a Go thing. Shameless plug, here is a grapheme tokenizer for Go: https://github.com/clipperhouse/uax29/tree/master/graphemes

replies(2): >>45670657 #>>45675222 #
1. HeyImAlex ◴[] No.45670657[source]
Here’s my favorite post on the subject https://adam-p.ca/blog/2025/04/string-length/
replies(1): >>45674894 #
2. debugnik ◴[] No.45674894[source]
Finally an article that doesn't pretend grapheme clusters are the be-all end-all of Unicode handling.

I'm saving this one. Not exactly how I'd explain it, but it's simplified enough to share with my current co-workers without being misleading.