←back to thread

Phonetic Matching

(smoores.dev)
77 points raybb | 1 comments | | HN request time: 0.376s | source
Show context
asveikau ◴[] No.42172434[source]
The idea that "shore" and "sure" are pronounced "almost identically" would depend pretty heavily on your accent. The vowel is pretty different to me.

Also, the matches for "sorI" and "sorY" would seem to me to misinterpret the words as having a vowel at the end, rather than a silent vowel. If you're using data meant for foreign surnames, the rules of which may differ from English and which might have silent vowels be very rare depending on the original language, of course you may mispronounce English words like this, saying both shore and sure as "sore-ee".

I'm sure there are much better ways to transcribe orthography to phonetics, probably people have published libraries that do it. From some googling, it seems like some people call this type of library a phonemic transcriber or IPA transcriber.

replies(5): >>42172850 #>>42173496 #>>42177414 #>>42179389 #>>42180312 #
1. smoores ◴[] No.42179389[source]
It's true, "sure" and "shore" are not pronounced exactly the same, and accents absolutely can vary, which is part of why Beider-Morse produces multiple encodings for each word. But the goal of Soundex-style phonetic encoding systems isn't to perfectly encode a word with a precise alphabet like the IPA. Rather, they intentionally introduce fuzziness so that words (really, names) that are pronounced similarly will be encoded the same way.

Perhaps "sure" and "shore" was a bad example; it's tricky to come up with these! And you're right that the encodings that happen to overlap for those words are technically "incorrect" pronunciations; again, these Soundex-style encoders are designed for surnames, not general English words. Some Storyteller users are testing out a version of Storyteller using this encoder to see if it makes any improvements (so far it seems like it's not worse, but not necessarily better!), but I won't be surprised if it doesn't end up making it into Storyteller long term.

Mostly I wrote this piece not to advocate for using BMPM to support forced alignment, but as a way to express the emotional journey that I found myself on as I learned more about these systems and where they came from.