←back to thread

Go subtleties

(harrisoncramer.me)
235 points darccio | 4 comments | | HN request time: 0.215s | source
1. mwsherman ◴[] No.45670197[source]
There is mention of how len() is bytes, not “characters”. A further subtlety: a rune (codepoint) is still not necessarily a “character” in terms of what is displayed for users — that would be a “grapheme”.

A grapheme can be multiple codepoints, with modifiers, joiners, etc.

This is true in all languages, it’s a Unicode thing, not a Go thing. Shameless plug, here is a grapheme tokenizer for Go: https://github.com/clipperhouse/uax29/tree/master/graphemes

replies(2): >>45670657 #>>45675222 #
2. HeyImAlex ◴[] No.45670657[source]
Here’s my favorite post on the subject https://adam-p.ca/blog/2025/04/string-length/
replies(1): >>45674894 #
3. debugnik ◴[] No.45674894[source]
Finally an article that doesn't pretend grapheme clusters are the be-all end-all of Unicode handling.

I'm saving this one. Not exactly how I'd explain it, but it's simplified enough to share with my current co-workers without being misleading.

4. virtualritz ◴[] No.45675222[source]
len() is also returning int instead of uint/uint64 in Go.

I do not use Go but ran into this when I had to write a Go wrapper for some Rust stuff the other day. I was baffled.