←back to thread

180 points xnacly | 3 comments | | HN request time: 0.308s | source
Show context
zX41ZdbW ◴[] No.44562278[source]
I recommend taking a look at the ClickHouse SQL Lexer:

https://github.com/ClickHouse/ClickHouse/blob/master/src/Par...

https://github.com/ClickHouse/ClickHouse/blob/master/src/Par...

It supports SIMD for accelerated character matching, it does not do any allocations, and it is very small (compiles to a few KB of WASM code).

replies(1): >>44564428 #
1. tuveson ◴[] No.44564428[source]
How much of an improvement does SIMD offer for something like this? It looks like it's only being used for strings and comments, but I would kind of assume that for most programming languages, the proportion of code that is long strings / comments is not large. Also curious if there's any performance penalty for trying to do SIMD if most of the comments and strings are short.
replies(2): >>44564666 #>>44570454 #
2. camel-cdr ◴[] No.44564666[source]
Usually lexing isn't part of the performance equation compared to all other parts of the compiler, but SIMD can be used to speedup the number parsing.
3. Sesse__ ◴[] No.44570454[source]
Random data point: Implementing SIMD for tokenizing identifiers sped up the Chromium CSS parser (as a whole, not just the tokenizer) by ~2–3%.