←back to thread

27 points roggenbuck | 1 comments | | HN request time: 0s | source

I wanted a safer alternative to RegExp for TypeScript that uses a linear-time engine, so I built Regolith.

Why: Many CVEs happen because TypeScript libraries are vulnerable to Regular Expression Denial of Service attacks. I learned about this problem while doing undergraduate research and found that languages like Rust have built-in protection but languages like JavaScript, TypeScript, and Python do not. This library attempts to mitigate these vulnerabilities for TypeScript and JavaScript.

How: Regolith uses Rust's Regex library under the hood to prevent ReDoS attacks. The Rust Regex library implements a linear-time Regex engine that guarantees linear complexity for execution. A ReDoS attack occurs when a malicious input is provided that causes a normal Regex engine to check for a matching string in too many overlapping configurations. This causes the engine to take an extremely long time to compute the Regex, which could cause latency or downtime for a service. By designing the engine to take at most a linear amount of time, we can prevent these attacks at the library level and have software inherit these safety properties.

I'm really fascinated by making programming languages safer and I would love to hear any feedback on how to improve this project. I'll try to answer all questions posted in the comments.

Thanks! - Jake Roggenbuck

Show context
semiquaver ◴[] No.45035198[source]

  > Regolith attempts to be a drop-in replacement for RegExp and requires minimal (to no) changes to be used instead
vs

  > Since Regolith uses Rust bindings to implement the Rust Regex library to achieve linear time worst case, this means that backreferences and look-around aren't available in Regolith either.
Obviously it cannot be a drop-in replacement if the regex dialect differs. That it has a compatible API is not the only relevant factor. I’d recommend removing the top part from the readme.

Another thought: since backreferences and lookaround are the features in JS regexes which _cause_ ReDOS, why not just wrap vanilla JS regex, rejecting patterns including them? Wouldn’t that achieve the same result in a simpler way?

replies(4): >>45035253 #>>45035264 #>>45035460 #>>45035828 #
roggenbuck ◴[] No.45035264[source]
Thanks for the feedback! Yea, you're totally right. I'll update the docs to reflect this.

> why not just wrap vanilla JS regex, rejecting patterns including them?

Yea! I was thinking about this too actually. And this would solve the problem of being server side only. I'm thinking about making a new version to do just this.

For a pattern rejecting wrapper, how would you want it to communicate that an unsafe pattern has been created.

replies(2): >>45036058 #>>45037162 #
1. DemocracyFTW2 ◴[] No.45036058[source]
> how would you want it to communicate that an unsafe pattern has been created

Given this is running on a JS engine, an error should be thrown much as an error will be thrown on syntactically invalid regexes in the source. Sadly, this can't happen a module load / compile time unless a build step is implemented, complicating the matter; but on the other hand, a regex that is never used can also not be a problem. The build step could be stupidly simple, such as relying on an otherwise disallowed construction like `safe/[match]*me/`.