(xnacly.me)

180 points xnacly | 3 comments | 14 Jul 25 14:42 UTC | HN request time: 0.308s | source

Show context

zX41ZdbW ◴[14 Jul 25 16:48 UTC] No.44562278[source]▶

I recommend taking a look at the ClickHouse SQL Lexer:

https://github.com/ClickHouse/ClickHouse/blob/master/src/Par...

It supports SIMD for accelerated character matching, it does not do any allocations, and it is very small (compiles to a few KB of WASM code).

replies(1): >>44564428 #

1. tuveson ◴[14 Jul 25 19:41 UTC] No.44564428[source]▶

>>44562278 #

How much of an improvement does SIMD offer for something like this? It looks like it's only being used for strings and comments, but I would kind of assume that for most programming languages, the proportion of code that is long strings / comments is not large. Also curious if there's any performance penalty for trying to do SIMD if most of the comments and strings are short.

replies(2): >>44564666 #>>44570454 #

2. camel-cdr ◴[14 Jul 25 20:07 UTC] No.44564666[source]▶

>>44564428 (TP) #

Usually lexing isn't part of the performance equation compared to all other parts of the compiler, but SIMD can be used to speedup the number parsing.

3. Sesse__ ◴[15 Jul 25 12:29 UTC] No.44570454[source]▶

>>44564428 (TP) #

Random data point: Implementing SIMD for tokenizing identifiers sped up the Chromium CSS parser (as a whole, not just the tokenizer) by ~2–3%.

↑

Strategies for Fast Lexers