←back to thread

Regex Isn't Hard (2023)

(timkellogg.me)
75 points asicsp | 1 comments | | HN request time: 1.205s | source
Show context
lairv ◴[] No.43750827[source]
My issue with regexes is that the formal definition of regex I learned at university is clear and simple [0] but then using them in programming languages is always a mess

[0] https://en.wikipedia.org/wiki/Regular_expression#Formal_lang...

replies(1): >>43769129 #
1. krackers ◴[] No.43769129[source]
The issue is the formal definition of regex only deals with whether a string belongs to language recognized by regex or not (boolean accept/non-accept), but regex in practice often talks in terms of "find the substring (if any) that matches". Which then causes issues because a regex is equivalent to an NFA so a given string can be matched in possibly multiple ways, which forces you to bring in the notion of a "greedy" vs "non-greedy" match in order to disambiguate. And then add in top of that the desire to define sub-matches in terms of capturing groups, and it's just a complete mess. And that's not even getting to not-strictly regular PCRE extensions like lookaround, backreferences, etc.