←back to thread

Regex Isn't Hard (2023)

(timkellogg.me)
75 points asicsp | 2 comments | | HN request time: 0.43s | source
1. m463 ◴[] No.43758574[source]
Regexes are powerful, useful and needlessly hard to use.

But not because of the regex idea itself.

It is quoting.

The reason people don't properly learn how to use a regex is because they are insulated from it by whatever language they are using.

It's literally like those surgeons who do heart surgery starting at a vein in your leg.

I use regexes all the time, in emacs, python, perl, bash, sed, awk, grep and more...

and just about every time the regex syntax is mixed with single quotes, double quotes, backslashes, $variable names and more from the "enclosing language or tool".

If I have a parenthesis or $, I'm always wondering if it is part of the enclosing language, or the matching pattern, or the literal. Also, the kind of regex adds to the confusion (basic or extended regex?)

I think it would be nice to have a syntax highlighter that would help with this, independent of language. green for variable or other language construct, red for regex pattern, white for matching literal.

replies(1): >>43759153 #
2. recursivecaveat ◴[] No.43759153[source]
Wait until somebody uses string templating to insert something that ends with a backslash, changing the meaning of following characters from what the syntax highlighting thinks; a curse be upon that person.

Escaping/quoting is such a mud pile everywhere because it's in-band communication, but nobody would tolerate all out-of-band because it's too tedious. At least newer languages are getting better with things like 'raw' strings or Rust's arbitrarily long delimeters, but I'd still like more control.

I'm surprised I never see languages adopt directed delimeters like {my string} or something, since it lets you avoid escaping in the very common case of balanced internal delimeters.