Parser Combinators Beat Regexes

1. austin-cheney ◴[10 Apr 25 00:10 UTC] No.43639341[source]▶

There are numerous posts, comments, articles and so forth about when to use regex versus using a parser. The general rule is this:

If you need to do a search from a string, such as needle(s) from a hat stack, regex is probably more ideal than a parser. If you need anything more intelligent than a list of search results you probably want a full formal parser.

Most languages allow a form of nested regex that allow for increased search precision. This occurs when a method that makes use of a regex returns to a function whose argument is a matching string result, which is why regex is probably enough when the business is primitive. There is a tremendously higher cost to using a full parser, considering the lexer and tokenizer plus rules, but it’s so much more intelligent that it’s not even comparable.

replies(2): >>43639533 #>>43641180 #

2. giraffe_lady ◴[10 Apr 25 00:46 UTC] No.43639533[source]▶

>>43639341 (TP) #

The key thing for me in making this decision is of course predicting the future.

Parsers are more work up front but in my experience much easier to debug, maintain and extend. So if I predict something will be around for a while I'll want to do a parser and if it's just a one-off I usually go with regex.

Of course this predictably leads to me having a collection of parsers that I've never needed to touch again and a handful of monstrously complex regex grammars that I'll rewrite into a parser "the next time" I need to modify them.

I still think it's the right thing to base the decision on I just need to keep improving my prophesying.

3. kleiba ◴[10 Apr 25 06:19 UTC] No.43641180[source]▶

>>43639341 (TP) #

Of course you could also pull out the good old Chomsky hierarchy and make an argument against regexes based on whatever the nature of your data is.

But the thing is: the beauty of regexes lies in their compactness (which, in turn, can make them quite difficult to debug). So, of course, if you want to optimize for some other measure, you'd use an alternative approach (e.g. a parser). And there are a number of worthwhile measures, such as e.g. the already mentioned debuggability, appropriateness for the problem at hand in terms of complexity, processing speed, ease of using the match result, the ability to get multiple alternative matches, support of regexes in your language of choice, etc.

But simply stating "X beats regexes" without saying in what respect leaves something to be desired.