←back to thread

Regex Isn't Hard (2023)

(timkellogg.me)
75 points asicsp | 2 comments | | HN request time: 0s | source
Show context
thomasmg ◴[] No.43750577[source]
For me, the main problem of the Regex syntax is the escaping rules: Many characters require escaping: \ { } ( ) [ ] | * + ? ^ $ . And the rules are different inside square brackets. I think it would be better if literal text is enclosed in quotes; that way, much less escaping is needed, but it would still be concise (and sometimes, more concise). I tried to formulate a proposal here: https://github.com/thomasmueller/bau-lang/blob/main/RegexV2....
replies(1): >>43750685 #
1. lhamil64 ◴[] No.43750685[source]
One thing I noticed with the example `['0-9a-f']`

Doesn't this go against the "literals are enclosed in quotes" idea? In this case, you have a special character (`-`) inside a quoted string. IMO this would be more consistent: `['0'-'9''a'-'f'']`, maybe even have comma separation like `['0'-'9','a'-'f'']`. This would also allow you to include the character classes like `[d,'a'-'f'']` although that might be a little confusing if you're used to normal regex.

replies(1): >>43751066 #
2. thomasmg ◴[] No.43751066[source]
Thanks for reading and taking the time to respond!

> Doesn't this go against the "literals are enclosed in quotes" idea?

Sure, one could argue that other changes would also be useful, but then it would be less concise. I think the main reasons why people like regex are: (a) powerful, (b) concise.

For my V2 proposal, the new rule is: "literals are enclosed in quotes", the rule isn't "_only_ literals are enclosed in quotes" :-) In this case, I think `-` can be quoted as well. I wanted to keep the v2 syntax as close as possible to the existing syntax.