←back to thread

Phonetic Matching

(smoores.dev)
77 points raybb | 1 comments | | HN request time: 0.209s | source
Show context
asveikau ◴[] No.42172434[source]
The idea that "shore" and "sure" are pronounced "almost identically" would depend pretty heavily on your accent. The vowel is pretty different to me.

Also, the matches for "sorI" and "sorY" would seem to me to misinterpret the words as having a vowel at the end, rather than a silent vowel. If you're using data meant for foreign surnames, the rules of which may differ from English and which might have silent vowels be very rare depending on the original language, of course you may mispronounce English words like this, saying both shore and sure as "sore-ee".

I'm sure there are much better ways to transcribe orthography to phonetics, probably people have published libraries that do it. From some googling, it seems like some people call this type of library a phonemic transcriber or IPA transcriber.

replies(5): >>42172850 #>>42173496 #>>42177414 #>>42179389 #>>42180312 #
woodrowbarlow ◴[] No.42173496[source]
IPA is the most-used tool by linguistic researchers for encoding pronunciation in a standardized way. IPA is criticized for being a little bit anglo-centric and falls short for some languages and edge cases, but overall it performs pretty well. (learned from an ex who studies linguistics.)
replies(4): >>42173671 #>>42174382 #>>42174781 #>>42177483 #
1. tokinonagare ◴[] No.42174382[source]
The issue is not really in the IPA but how to use it. If you stay at the phonemic level, it's makes more words comparable but hides distinctions that occurs only in dialects. Also for a lot of language, there's multiple modelization in terms of the set of phonemes involved. If you go down the phonetic rabbit hole the notation quickly become read heard to read. If you have to handle multiples variations, there's also diaphonemes but then it's even less standardized.