Most active commenters
  • kelafoja(4)
  • bazoom42(4)

←back to thread

Regex Isn't Hard (2023)

(timkellogg.me)
75 points asicsp | 20 comments | | HN request time: 4.006s | source | bottom
1. gwd ◴[] No.43750572[source]
So my brother doesn't code for a living, but has done a fair amount of personal coding, and also gotten into the habit of watching live-coding sessions on YouTube. Recently he's gotten involved in my project a bit, and so we've done some pair programming sessions, in part to get him up to speed on the codebase, in part to get him up to speed on more industrial-grade coding practices and workflows.

At some point we needed to do some parsing of some strings, and I suggested a simple regex. But apparently a bunch of the streamers he's been watching basically have this attitude that regexes stink, and you should use basically anything else. So we had a conversation, and compared the clarity of coding up the relatively simple regex I'd made, with how you'd have to do it procedurally; I think the regex was a clear winner.

Obviously regexes aren't the right tool for every job, and they can certainly be done poorly; but in the right place at the right time they're the simplest, most robust, easiest to understand solution to the problem.

replies(1): >>43750627 #
2. kelafoja ◴[] No.43750627[source]
My problem is that regexes are write-only, unreadable once written (to me anyway). And sometimes they do more than you intended. You maybe tested on a few inputs and declared it fit for purpose, but there might be more inputs upon which it has unintended effects. I don't mind simple, straight-forward regexes. But when they become more complex, I tend to prefer to write out the procedural code, even if it is (much) longer in terms of lines. I find that generally I can read code better than regexes, and that code I write is more predictable than regexes I write.
replies(6): >>43750642 #>>43750826 #>>43751127 #>>43751152 #>>43751569 #>>43751927 #
3. fragmede ◴[] No.43750642[source]
You know you can write comments in your code where the regexp is, right?
replies(2): >>43750686 #>>43765568 #
4. ◴[] No.43750686{3}[source]
5. latexr ◴[] No.43750826[source]
> unreadable once written (to me anyway). (…) there might be more inputs upon which it has unintended effects.

https://regex101.com can explain your regex back to you, and allows you to test it with more inputs.

Though I’m not trying to convince you to always use regular expressions, I agree with GP:

> Obviously regexes aren't the right tool for every job, and they can certainly be done poorly; but in the right place at the right time they're the simplest, most robust, easiest to understand solution to the problem.

6. bazoom42 ◴[] No.43751127[source]
> I tend to prefer to write out the procedural code, even if it is (much) longer in terms of lines.

This might work for you, but in general the amount of bugs is proportional to the amount of code. The regex engine is alredy throughly tested by someone else while a custom implementation in procedural code will probably have bugs and be a lot more work to maintain if the pattern changes.

replies(3): >>43751445 #>>43753974 #>>43765539 #
7. jcelerier ◴[] No.43751152[source]
What makes them unreadable to you ? 99% of the time you can just read them character by character with maybe some groups and back references
replies(1): >>43754116 #
8. justin66 ◴[] No.43751445{3}[source]
> This might work for you, but in general the amount of bugs is proportional to the amount of code.

If you wanted to look for cases which serve as an exception to this rule, code relying on regexes would be an excellent place to start.

9. bena ◴[] No.43751569[source]
Kind of fair.

I don't incorporate a lot of regular expressions into my code. But where I do like them is for search and replace. So I do treat them as mostly disposable.

10. rusk ◴[] No.43751927[source]
These are all valid criticisms of regex

but they’re not an excuse to avoid regex. Similarly git has many warts but there’s no getting around it. Same with CSS

If you want to run with the herd though you need to know these things, even enjoy them.

You can rely on tooling and training wheels like Python VERBOSE but you’re never going to get away from the fact that the “rump” of the population works with them.

Easier to bite the bullet and get practised. I’ve no doubt you have the intellect - you only need be convinced it’s a good use of your time.

11. rerdavies ◴[] No.43753974{3}[source]
In general, the correctness of the code is proportional to its readability.

I also prefer procedural code instead of regexes.

replies(1): >>43755629 #
12. bluecheese452 ◴[] No.43754116{3}[source]
I don’t think this is a particularly useful question. If they could accurately describe what exactly is confusing they wouldn’t be confused.
13. bazoom42 ◴[] No.43755629{4}[source]
Surely complexity is a factor? A procedual implementation will necessarily have the same essential complexity as the regex it replaces, but then it will additionally have a bunch of incidental complexity in matching and looping and backtracking.

Regexes can certainly be hard to read - the solution is to use formatting and comments to make them easier to understand - not to drown the logic in reams of boilerplate code.

replies(1): >>43765553 #
14. kelafoja ◴[] No.43765539{3}[source]
That is quite a generalization. The regex engine is tested, but my specific regular expression isn't. My ability to write correct regular expressions is weak, so there can be many bugs in the one line of regular expession.
replies(2): >>43822087 #>>43822524 #
15. kelafoja ◴[] No.43765553{5}[source]
> A procedual implementation will necessarily have the same essential complexity as the regex it replaces

I don't think I fully agree with this, and I don't see a basis for why this should be true. If I have a very specific implementation, it could have very little incidental complexity, it could be fully targeted to the use case. Whereas with regular expressions there is incidental complexity of the regex engine itself by definition.

replies(1): >>43771185 #
16. kelafoja ◴[] No.43765568{3}[source]
You know that there are more friendly sounding ways to give this suggestion, right?
17. bazoom42 ◴[] No.43771185{6}[source]
Complexity in the standard library is not that relevant. If you make your own custom dictionary implementation, you increase complexity of your code base compared to just using the one in the standard library, even if your own implementaion is simpler.

The relevant complexity for using a regex is the complexity of the pattern itself and the complexity of invoking the regex. Any custom procedural solution will be more complex unless it is literally something as simple as checking whether a string contain a given literal string.

replies(1): >>43792521 #
18. rerdavies ◴[] No.43792521{7}[source]
For some arbitrary definition of complex.
19. ◴[] No.43822087{4}[source]
20. bazoom42 ◴[] No.43822524{4}[source]
If you have made a bug in the specification of the pattern to match, then you will have the same bug in the hand-rolled implementation of the matching. It will just be more difficult to find the bug since the pattern is not explicitly specified anymore.