I recommend taking a look at the ClickHouse SQL Lexer:
https://github.com/ClickHouse/ClickHouse/blob/master/src/Par...
https://github.com/ClickHouse/ClickHouse/blob/master/src/Par...
It supports SIMD for accelerated character matching, it does not do any allocations, and it is very small (compiles to a few KB of WASM code).
How much of an improvement does SIMD offer for something like this? It looks like it's only being used for strings and comments, but I would kind of assume that for most programming languages, the proportion of code that is long strings / comments is not large. Also curious if there's any performance penalty for trying to do SIMD if most of the comments and strings are short.