←back to thread

27 points roggenbuck | 1 comments | | HN request time: 0.216s | source

I wanted a safer alternative to RegExp for TypeScript that uses a linear-time engine, so I built Regolith.

Why: Many CVEs happen because TypeScript libraries are vulnerable to Regular Expression Denial of Service attacks. I learned about this problem while doing undergraduate research and found that languages like Rust have built-in protection but languages like JavaScript, TypeScript, and Python do not. This library attempts to mitigate these vulnerabilities for TypeScript and JavaScript.

How: Regolith uses Rust's Regex library under the hood to prevent ReDoS attacks. The Rust Regex library implements a linear-time Regex engine that guarantees linear complexity for execution. A ReDoS attack occurs when a malicious input is provided that causes a normal Regex engine to check for a matching string in too many overlapping configurations. This causes the engine to take an extremely long time to compute the Regex, which could cause latency or downtime for a service. By designing the engine to take at most a linear amount of time, we can prevent these attacks at the library level and have software inherit these safety properties.

I'm really fascinated by making programming languages safer and I would love to hear any feedback on how to improve this project. I'll try to answer all questions posted in the comments.

Thanks! - Jake Roggenbuck

1. DemocracyFTW2 ◴[] No.45037026[source]
I have another nitpick and I hope it's a constructive one.

You provide some performance figures; unfortunately they are caught in an image, no doubt to enable color-coding the results. IMHO that's not ideal, tables should be pure text, even if only for accessibility with screen readers. There are other means to provide guiding highlights, like red and green Unicode code points. GitHub is somewhat unique in its strict policy to remove almost any kind of user-side styling from the READMEs, but providing a "photo snapshot" of parts of the README just to get some colors does not feel like the right solution.

Next thing are the actual figures you provide: those range from 11.822µs (best) to 56.534s (worst). They are displayed as

    11.822µs
    56.534s 
making them look almost like the worst performer took around five times as long as the best performer—until you realize there's a mu in there.

I must say that personally I remove this so-called "human-readable" format almost wherever I can because I find it not human-readable at all. To me a good numerical display should try and keep the decimal points on top of each other, avoid too many non-significant digits, use digit grouping, and, crucially, use a single unit throughout. With those constraints, the two figures become

            11.8µs
    56,534,000.0µs
which incidentally obviates much of the need to color code anything. One could discuss what unit—ns, µs, ms, s—is the most appropriate in the given context but, generally, I feel that big numbers should stand out as having many digits.

Nobody will pick this up because it's much too elaborate and idiosyncratic for this conformist world, but I just love the 'Japanese' way of formatting where you do digit grouping with the SI prefixes, so one hundred and twenty-five meters is 125m, but one thousand one hundred and twenty-five meters doesn't become 1,125m, nor is it 1.125km, but rather 1k125m (preferrably with a thin space as in 1k_125m—imagine a thin non-breakable space there that HN wouldn't let me render).

1G 255M 368k 799B, what's not to like?