←back to thread

205 points ashvardanian | 1 comments | | HN request time: 0.253s | source
Show context
unwind ◴[] No.46288255[source]
Very cool and impressive performance.

I was worried (I find it confusing when Unicode "shadows" of normal letters exist, and those are of course also dangerous in some cases when they can be mis-interpreted for the letter they look more or less exactly like) by the article's use of U+212A (Kelvin symbol) as sample text, so I had to look it up [1].

Anyway, according to Wikipedia the dedicated symbol should not be used:

However, this is a compatibility character provided for compatibility with legacy encodings. The Unicode standard recommends using U+004B K LATIN CAPITAL LETTER K instead; that is, a normal capital K.

That was comforting, to me. :)

[1]: https://en.wikipedia.org/wiki/Kelvin#Orthography

replies(1): >>46288569 #
jjmarr ◴[] No.46288569[source]
> I find it confusing when Unicode "shadows" of normal letters exist, and those are of course also dangerous in some cases when they can be mis-interpreted for the letter they look more or less exactly like

Isn't this why Unicode normalization exists? This would let you compare Unicode letters and determine if they are canonically equivalent.

replies(2): >>46289094 #>>46289684 #
ComputerGuru ◴[] No.46289094[source]
Normalization wouldn’t address this.
replies(2): >>46289219 #>>46289262 #
1. happytoexplain ◴[] No.46289219[source]
What do you mean? All four normal forms of the Kelvin 'K' are the Latin 'K', as far as I can tell.