Strategies for Fast Lexers

(xnacly.me)

180 points xnacly | 2 comments | 14 Jul 25 14:42 UTC | HN request time: 0.415s | source

Show context

norir ◴[14 Jul 25 16:43 UTC] No.44562212[source]▶

>>44560871 (OP) #

Lexing being the major performance bottleneck in a compiler is a great problem to have.

replies(3): >>44563135 #>>44568294 #>>44568430 #

norskeld ◴[14 Jul 25 17:53 UTC] No.44563135[source]▶

>>44562212 #

Is lexing ever a bottleneck though? Even if you push for lexing and parsing 10M lines/second [1], I'd argue that semantic analysis and codegen (for AOT-compiled languages) will dominate the timings.

That said, there's no reason not to squeeze every bit of performance out of it!

[1]: In this talk about the Carbon language, Chandler Carruth shows and explains some goals/challenges regarding performance: https://youtu.be/ZI198eFghJk?t=1462

replies(3): >>44563278 #>>44564311 #>>44568469 #

1. munificent ◴[14 Jul 25 18:04 UTC] No.44563278[source]▶

>>44563135 #

It depends a lot on the language.

For a statically typed language, it's very unlikely that the lexer shows up as a bottleneck. Compilation time will likely be dominated by semantic analysis, type checking, and code generation.

For a dynamically typed language where there isn't as much for the compiler to do, then the lexer might be a more noticeable chunk of compile times. As one of the V8 folks pointed out to me years ago, the lexer is the only part of the compiler that has to operate on every single individual byte of input. Everything else gets the luxury of greater granularity, so the lexer can be worth optimizing.

replies(1): >>44580487 #

2. norskeld ◴[16 Jul 25 09:58 UTC] No.44580487[source]▶

>>44563278 (TP) #

Ah, yes, that's totally fair. In case of JS (in browsers) it's sort of a big deal, I suppose, even if scripts being loaded are not render-blocking: the faster you lex and parse source files, the faster page becomes interactive.

P.S. I absolutely loved "Crafting Interpreters" — thank you so much for writing it!

↑