←back to thread

206 points ashvardanian | 1 comments | | HN request time: 0.193s | source
Show context
andersa ◴[] No.46287769[source]
From a German user perspective, ICU and your fancy library are incorrect, actually. Mass is not a different casing of Maß, they are different characters. Google likely changed this because it didn't do what users wanted.
replies(5): >>46287929 #>>46288240 #>>46288242 #>>46288366 #>>46288467 #
1. Arnt ◴[] No.46288366[source]
Ah, let's have a long discussion of this.

Unicode avoids "different" and "same", https://www.unicode.org/reports/tr15/ uses phrases like compatibility equivalence.

The whole thing is complicated, because it actually is complicated in the real world. You can spell the name of Gießen "Giessen" and most Germans consider it correct even if not ideal, but spelling Massachusetts "Maßachusetts" is plainly wrong in German text. The relationship between ß and ss isn't symmetric. Unicode captures that complexity, when you get into the fine details.