The issue is the formal definition of regex only deals with whether a string belongs to language recognized by regex or not (boolean accept/non-accept), but regex in practice often talks in terms of "find the substring (if any) that matches". Which then causes issues because a regex is equivalent to an NFA so a given string can be matched in possibly multiple ways, which forces you to bring in the notion of a "greedy" vs "non-greedy" match in order to disambiguate. And then add in top of that the desire to define sub-matches in terms of capturing groups, and it's just a complete mess. And that's not even getting to not-strictly regular PCRE extensions like lookaround, backreferences, etc.