This is sort of the inverse of the problem IPA is trying to solve. You're correct in that IPA is used to try to encode pronunciation. But phonetic matching is trying to encode those areas where different people, in different accents (maybe languages), say or write semantically the same thing, but differently -- but you need to find all the others using only one of the different versions
without finding things that are not or irrelevant.
Basically it's trying to smush all the different versions together into a single sort of cluster, where the identity of the cluster is any of the versions.
I used to work in this field about 30 years ago, specifically how names can end up being latinized when coming from non-latin languages. We were very focused on trying to collapse variants into a complex ruleset that could be used both to recognize the cluster of names as being the same "thing", and then that ruleset could also produce all the valid variants. It was very much a kind of applied "expert systems" approach that predated ML.
The rulesets were more or less context free grammars and regular expressions that we could use to "decompile" a name token into a kind of limited regular expression (no infinite closures) and then recompile the expression back into a list of other variants. Each variant in turn was supposed to "decompile" back into the same expression so a name could be part of a kind of closed algebra of names all with the same semantic meaning.
For example:
A Korean name like "Park" might turn into a {rule} that would also generate "Pak", "Paek", "Baek", etc.
Any one of those would also generate the same {rule}.
In practice it worked surprisingly well, and the struggle was mostly in identifying the areas where the precision/recall in this scheme caused the names to not form a closed algebra.
Building the rules was an ungoldly amount of human labor though, with expert linguists involved at every step.
These days I'm sure the problem would be approached in an entirely different way.