I love regular expressions but one thing I've learned over the years is the syntax is dense enough that even people who are confident enough to start writing regex tutorials often can't write a regex that matches an IP address.
I love regular expressions but one thing I've learned over the years is the syntax is dense enough that even people who are confident enough to start writing regex tutorials often can't write a regex that matches an IP address.
edit: asking partly, because in my current work I occassionally have to convince non-technical users to use one type of entry over other. For that reason, easy to read, simple regex wins over fancy, but convoluted regex.
Is it what `inet_addr` accept? In that case, "1", "0x1", "00.01", "00000.01", and more are all ip addresses. `ping` accepts all of em anyway.
Is a valid ipv6 address one with the square brackets around it? Is "::1" a valid ip address? What about "fe80::1%eth2"? ping accepts both of these on my machine (though probably not on yours, since you probably don't have an eth2 interface)
Also, `16843009` is an IP address, try pinging it.
(
(
25[0-5] # 250-255
|
2[0-4][0-9] # 200-249
|
1[0-9]{2} # 100-199
|
[1-9][0-9] # 10-99
|
[0-9]
)
\.
){3}
(
25[0-5] # 250-255
|
2[0-4][0-9] # 200-249
|
1[0-9]{2} # 100-199
|
[1-9][0-9] # 10-99
|
[0-9]
)
… but without all the nice white space and comments, unless you’re willing to discuss regex engines that let you do multi-line/commented literals like that… I think ruby does, not sure what other languages.The problem is that expressing “an integer from 0-255” is surprisingly complicated for regex engines to express. And that’s not even accounting for IP addresses that don’t use dots (which is legal as an argument to most software that connects to an IP address), as other commenters have pointed out.
Syntax error aside (there's an extra ] floating around), it's not even close to correct -- it'll match "999.999.999.000.999." among other things, will never match just one digit (there's a missing ?), and always insists on the trailing dot.
Sure, I'd take \d+\.\d+\.\d+\.\d+ over... "((2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])\.){3}(2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])", assuming that I then validate the results afterwards.
You're right that Ruby has it. Perl also has /x, of course (since most of Ruby regex was "inspired" directly by Perl's syntax), as well as Python (re.VERBOSE). Otherwise, yeah, it's disappointingly rare.
<0-255>\.<0-255>\.<0-255>\.<0-255>
will only match full IPv4 addresses, but is a lot stricter than the one in the article.EDIT: formatting
For something like locating IP addresses in text, using a regex to identify candidates is a great idea. But as you show, you don’t want to implement the full validation in it. Use regex to find dotted digit groups, but validate the actual numeric values as a separate step afterwards.
Characters which are sometimes special, depending on context, are one more thing making regexes harder than they appear at first sight.
The author's willingness to publish code without even minimal testing does not inspire confidence.
(?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?: \r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\ ](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?: (?:\r\n)?[ \t])))|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n) ?[ \t]))\<(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t] )))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))) :(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r \n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\]( ?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(? :\r\n)?[ \t])))\>(?:(?:\r\n)?[ \t]))|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))"(?:(?:\r\n)?[ \t])):(?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t] )(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))\<(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)\](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t])(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))\>(?:(?:\r\n)?[ \t]))(?:,\s( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:\.(?:( ?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t ])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(? :\.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))\<(?:(?:\r\n) ?[ \t])(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n) ?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\.(?:(?:\r\n)?[ \t] )(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:\.(?:(?: \r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]) ))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t]))(?:\ .(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)\](?:(?:\r\n)?[ \t])))\>(?:( ?:\r\n)?[ \t]))))?;\s)
Someone else then asks the absolute razor of a question: 'What value does this add over just verifying that the input is of the form {something}@{something}.{something}?'
Depends if {something} can contain periods for my email.
name@antispam.mydomain.com
Calling the extra ] a syntax error was a slight exaggeration on my behalf, but that was clearly an unintended extra character -- there's no way the author thinks "123].45].67].89]" is a valid IP address. But yes, it does compile and is interpreted as a valid regex, albeit not a useful one in this context.
The out-of-range values are not ideal but can be fixed with post-validation in code (which is cleaner than writing unnecessarily complicated regex, anyways). The missing ? leads to a bunch of false negatives, and the trailing . causes even more problems.