←back to thread

Regex Isn't Hard (2023)

(timkellogg.me)
75 points asicsp | 2 comments | | HN request time: 0.001s | source
Show context
gwd ◴[] No.43750572[source]
So my brother doesn't code for a living, but has done a fair amount of personal coding, and also gotten into the habit of watching live-coding sessions on YouTube. Recently he's gotten involved in my project a bit, and so we've done some pair programming sessions, in part to get him up to speed on the codebase, in part to get him up to speed on more industrial-grade coding practices and workflows.

At some point we needed to do some parsing of some strings, and I suggested a simple regex. But apparently a bunch of the streamers he's been watching basically have this attitude that regexes stink, and you should use basically anything else. So we had a conversation, and compared the clarity of coding up the relatively simple regex I'd made, with how you'd have to do it procedurally; I think the regex was a clear winner.

Obviously regexes aren't the right tool for every job, and they can certainly be done poorly; but in the right place at the right time they're the simplest, most robust, easiest to understand solution to the problem.

replies(1): >>43750627 #
kelafoja ◴[] No.43750627[source]
My problem is that regexes are write-only, unreadable once written (to me anyway). And sometimes they do more than you intended. You maybe tested on a few inputs and declared it fit for purpose, but there might be more inputs upon which it has unintended effects. I don't mind simple, straight-forward regexes. But when they become more complex, I tend to prefer to write out the procedural code, even if it is (much) longer in terms of lines. I find that generally I can read code better than regexes, and that code I write is more predictable than regexes I write.
replies(6): >>43750642 #>>43750826 #>>43751127 #>>43751152 #>>43751569 #>>43751927 #
bazoom42 ◴[] No.43751127[source]
> I tend to prefer to write out the procedural code, even if it is (much) longer in terms of lines.

This might work for you, but in general the amount of bugs is proportional to the amount of code. The regex engine is alredy throughly tested by someone else while a custom implementation in procedural code will probably have bugs and be a lot more work to maintain if the pattern changes.

replies(3): >>43751445 #>>43753974 #>>43765539 #
rerdavies ◴[] No.43753974[source]
In general, the correctness of the code is proportional to its readability.

I also prefer procedural code instead of regexes.

replies(1): >>43755629 #
bazoom42 ◴[] No.43755629[source]
Surely complexity is a factor? A procedual implementation will necessarily have the same essential complexity as the regex it replaces, but then it will additionally have a bunch of incidental complexity in matching and looping and backtracking.

Regexes can certainly be hard to read - the solution is to use formatting and comments to make them easier to understand - not to drown the logic in reams of boilerplate code.

replies(1): >>43765553 #
kelafoja ◴[] No.43765553{3}[source]
> A procedual implementation will necessarily have the same essential complexity as the regex it replaces

I don't think I fully agree with this, and I don't see a basis for why this should be true. If I have a very specific implementation, it could have very little incidental complexity, it could be fully targeted to the use case. Whereas with regular expressions there is incidental complexity of the regex engine itself by definition.

replies(1): >>43771185 #
1. bazoom42 ◴[] No.43771185{4}[source]
Complexity in the standard library is not that relevant. If you make your own custom dictionary implementation, you increase complexity of your code base compared to just using the one in the standard library, even if your own implementaion is simpler.

The relevant complexity for using a regex is the complexity of the pattern itself and the complexity of invoking the regex. Any custom procedural solution will be more complex unless it is literally something as simple as checking whether a string contain a given literal string.

replies(1): >>43792521 #
2. rerdavies ◴[] No.43792521[source]
For some arbitrary definition of complex.