←back to thread

Regex Isn't Hard (2023)

(timkellogg.me)
75 points asicsp | 2 comments | | HN request time: 0.427s | source
Show context
michaelt ◴[] No.43750496[source]
> e.g. This pattern ([0-9][0-9]?[0-9]][.])+ matches one, two or three digits followed by a . and also matches repeated patterns of this. This wold match an IP address (albeit not strictly).

I love regular expressions but one thing I've learned over the years is the syntax is dense enough that even people who are confident enough to start writing regex tutorials often can't write a regex that matches an IP address.

replies(9): >>43750531 #>>43750628 #>>43750641 #>>43750693 #>>43750726 #>>43751250 #>>43751329 #>>43751632 #>>43754055 #
ninkendo ◴[] No.43750693[source]
Writing one correctly is pretty complicated task if you’re trying to write a simple tutorial… off the top of my head, you’d need:

    (
      (
      25[0-5] # 250-255
      |
      2[0-4][0-9] # 200-249
      |
      1[0-9]{2} # 100-199
      |
      [1-9][0-9] # 10-99
      |
      [0-9]
      )
      \.
    ){3}
    (
    25[0-5] # 250-255
    |
    2[0-4][0-9] # 200-249
    |
    1[0-9]{2} # 100-199
    |
    [1-9][0-9] # 10-99
    |
    [0-9]
    )
    
… but without all the nice white space and comments, unless you’re willing to discuss regex engines that let you do multi-line/commented literals like that… I think ruby does, not sure what other languages.

The problem is that expressing “an integer from 0-255” is surprisingly complicated for regex engines to express. And that’s not even accounting for IP addresses that don’t use dots (which is legal as an argument to most software that connects to an IP address), as other commenters have pointed out.

replies(2): >>43750866 #>>43751438 #
1. vitus ◴[] No.43750866[source]
> I think ruby does, not sure what other languages.

You're right that Ruby has it. Perl also has /x, of course (since most of Ruby regex was "inspired" directly by Perl's syntax), as well as Python (re.VERBOSE). Otherwise, yeah, it's disappointingly rare.

replies(1): >>43756671 #
2. bazoom42 ◴[] No.43756671[source]
.net also supports verbose regex.