Strategies for Fast Lexers

1. sparkie ◴[14 Jul 25 16:33 UTC] No.44562104[source]▶

As an alternative to the computed gotos, you can use regular functions with the `[[musttail]]` attribute in Clang or GCC to achieve basically the same thing - the call in the tail position is replaced with a `jmp` instruction to the next function rather than to the label, and stack usage remains constant because the current frame is reutililzed for the called function. `musttail` requires that the calling function and callee have the same signature, and a prototype.

You'd replace the JUMP_TARGET macro:

    #define JUMP_TARGET goto *jump_table[(int32_t)l->input.p[l->pos]]

With:

    #ifdef __clang__
    #define musttail [[clang::musttail]]
    #elif __GNUC__
    #define musttail [[gnu::musttail]]
    #else
    #define musttail
    #endif
    #define JUMP_TARGET return musttail jump_table[(int32_t)l->input.p[l->pos]](l, a, out)

Then move the jump table out to the top level and replace each `&&` with `&`.

See diff (untested): https://www.diffchecker.com/V4yH3EyF/

This approach has the advantage that it will work everywhere and not only on compilers that support the computed gotos - it just won't optimize it on compilers that don't support `musttail`. (Though it has been proposed to standardize it in a future version of C).

It might also work better with code navigation tools that show functions, but not labels, and enables modularity as we can split rules over multiple translation units.

Performance wise should basically be the same - though it's been argued that it may do better in some cases because the compiler's register allocator doesn't do a great job in large functions with computed gotos - whereas in musttail approach each function is a smaller unit and optimized separately.

replies(4): >>44563630 #>>44567877 #>>44569195 #>>44572729 #

2. bestouff ◴[14 Jul 25 18:31 UTC] No.44563630[source]▶

>>44562104 (TP) #

Can't wait for mandatory TCO coming to Rust. But it's not there yet. https://github.com/phi-go/rfcs/blob/guaranteed-tco/text/0000...

replies(1): >>44564605 #

3. sparkie ◴[14 Jul 25 20:02 UTC] No.44564605[source]▶

>>44563630 #

Not sure I like the `become` keyword. Seems bizarre - someone encountering this word in code for the first time would have no idea what it's doing.

Why don't they just use `tailcall`? That would make it's obvious what it's doing because we've been using the term for nearly half a century, and the entire literature on the subject uses the term "tail call".

Even better would be to just automatically insert a tail call - like every other language that has supported tail calls for decades - provided the callee has the same signature as the caller. If it's undesirable because we want a stack trace, then instead have some keyword or attribute to suppress the tail call - such as `no_tail`, `nontail` or `donttail`.

Requiring tail calls to be marked will basically mean the optimization will be underutilized. Other than having a stack trace for debugging, there's basically no reason not to have the optimization on by default.

replies(5): >>44565254 #>>44566663 #>>44567717 #>>44568467 #>>44587305 #

4. kibwen ◴[14 Jul 25 21:02 UTC] No.44565254{3}[source]▶

>>44564605 #

Rust does allow tail call optimization. But that's LLVM's decision to optimize tail calls on a case-by-case basis. An explicit syntax to denote tail calls would be the difference between tail call optimization and guaranteed tall call elimination, which is important because if you're writing a tail-recursive function then it's pretty trivial to blow the stack at any moderate recursion depth unless you can guarantee the elimination.

As for why it's not trivial for Rust to do this by default, consider the question of what should happen in the case of local destructors, which in an ordinary function would be called after `return myfunc()` returns, but in a tail-recursive function would need to be called beforehand. The proposals for `become` tend to handle this by making it a compiler error to have any locals with destructors in scope at the point of the tail-call, further motivating the explicit syntax.

replies(1): >>44568213 #

5. nixpulvis ◴[15 Jul 25 00:15 UTC] No.44566663{3}[source]▶

>>44564605 #

I'm generally pretty conservative about keywords. But it changes the semantics of the return, so it makes sense to change the word used in that position.

6. andrewflnr ◴[15 Jul 25 04:02 UTC] No.44567717{3}[source]▶

>>44564605 #

As far as the name of the keyword: anyone who knows what "tail call" means will figure out "become" pretty quickly. If they don't get it from context clues, someone will just have to tell them, "oh, it's a tail call", and the confusion will dissolve, because "become" is really not a bad word for what happens in a tail call. (This is obviously less important than the implementation issues kibwen handled.)

7. teo_zero ◴[15 Jul 25 04:40 UTC] No.44567877[source]▶

>>44562104 (TP) #

On compilers that don't support musttail, won't this make the stack explode?

replies(1): >>44568940 #

8. burnt-resistor ◴[15 Jul 25 05:51 UTC] No.44568213{4}[source]▶

>>44565254 #

I looked into it. There's a crate for it: https://docs.rs/tailcall

replies(1): >>44572313 #

9. lmm ◴[15 Jul 25 06:36 UTC] No.44568467{3}[source]▶

>>44564605 #

> Even better would be to just automatically insert a tail call - like every other language that has supported tail calls for decades - provided the callee has the same signature as the caller.

I've used Scala for many years and concluded this was a mistake. It makes it too easy to accidentally rely on an optimisation and then break your code much later with a seemingly innocuous change. Better to make it explicit.

replies(1): >>44570323 #

10. gsliepen ◴[15 Jul 25 07:52 UTC] No.44568940[source]▶

>>44567877 #

AFAIK compilers will perform tail call optimization without [[musttail]], it's just not guaranteed (and probably it won't if you don't enable optimizations at all).

11. Sesse__ ◴[15 Jul 25 08:41 UTC] No.44569195[source]▶

>>44562104 (TP) #

As an alternative to the computed gotos, you can use switch/case in Clang or GCC to achieve basically the same thing. :-) It becomes a jump table in most cases. (The article claims that a jump table gives smaller code and fewer branch misses, but it doesn't actually give any numbers, and enough things in there are dubious enough that I'm not convinced they ever measured.)

https://blog.nelhage.com/post/cpython-tail-call/ has made the rounds a lot recently, and explores this for Python's bytecode interpreter.

replies(1): >>44573205 #

12. Quekid5 ◴[15 Jul 25 12:10 UTC] No.44570323{4}[source]▶

>>44568467 #

Just add the @tailrec annotation -- then the compiler will complain loudly if you break tail calls.

replies(1): >>44570442 #

13. lmm ◴[15 Jul 25 12:27 UTC] No.44570442{5}[source]▶

>>44570323 #

Yes, the problem is if you didn't add it but the compiler was still silently TCOing your function, until one day it doesn't.

replies(1): >>44586326 #

14. celeritascelery ◴[15 Jul 25 15:41 UTC] No.44572313{5}[source]▶

>>44568213 #

I am surprised this works reliably. I am interested to see what they are doing under the hood to guarantee tail call elimination.

15. bostick ◴[15 Jul 25 16:16 UTC] No.44572729[source]▶

>>44562104 (TP) #

FYI, in my opinion, clang `[[musttail]]` is not quite ready for prime time. (cannot speak to GCC)

I was excited when it was introduced but quickly ran into issues.

Here is a silent miscompilation involving `[[musttail]]` that I reported in 2022 and is still open: https://github.com/llvm/llvm-project/issues/56435

16. sparkie ◴[15 Jul 25 16:50 UTC] No.44573205[source]▶

>>44569195 #

The switch misses the point. The compiler isn't smart enough to convert it to direct-threading, to the best of my knowledge.

A switch only selects on one character. To continue lexing you need the switch inside a loop. The compiler might optimize the switch itself to a jump table - but what does each case do - jumps back to the start of the loop, after which it enters the jump table again. There are two branches involved.

The point of direct threading is that there is no loop - you simply jump directly to the handler for the next character at the end of each handler.

replies(1): >>44573284 #

17. Sesse__ ◴[15 Jul 25 16:56 UTC] No.44573284{3}[source]▶

>>44573205 #

> The compiler isn't smart enough to convert it to direct-threading, to the best of my knowledge.

If you read the URL I linked to, you will see that it is.

> The point of direct threading is that there is no loop - you simply jump directly to the handler for the next character at the end of each handler.

No, the point of direct threading is that you give the branch predictor more context to work with (effectively, the previous opcode), which was relevant with the branch predictors in typical CPUs 10+ years ago. (Modern ones, ITTAGE-style, have per-branch history also for indirect jumps.)

18. Quekid5 ◴[16 Jul 25 20:12 UTC] No.44586326{6}[source]▶

>>44570442 #

Sure, but if you're relying on the TCO, you kind of have to actually state that, regardless? (At least if you want to avoid accidental regressions.)

I don't see any way around having to state your intent (as a programmer) for this. It's just a semantic difference in behavior unless your Abstract Machine allows for just ignoring the possibility of stack overflow entirely.

replies(1): >>44588679 #

19. dfawcus ◴[16 Jul 25 22:05 UTC] No.44587305{3}[source]▶

>>44564605 #

FWIW, Alef also used a 'become' keyword for a tail-call version of 'return'.

That in a quite C-like language.

20. lmm ◴[17 Jul 25 01:09 UTC] No.44588679{7}[source]▶

>>44586326 #

> I don't see any way around having to state your intent (as a programmer) for this.

I want a way to state that my intent is to not rely on the TCO. Even a second explicit annotation for "do not TCO this function" would be better than what Scala currently does (but frankly TCO is surprising enough to debug that I think it should be opt-in rather than opt-out).

replies(1): >>44598924 #

21. Quekid5 ◴[17 Jul 25 22:20 UTC] No.44598924{8}[source]▶

>>44588679 #

> I want a way to state that my intent is to not rely on the TCO.

How would that work? You want a nontail keyword to indicate that intent explicitly?

I guess I could imagine a scenario where you want to de-optimize a TCO into a non-TC... but I mean... that's got to be rare enough to just not bother with?

EDIT: Also, this is the exact opposite of your comment which I replied to. Make up your mind on what you want and we can maybe find a way to achieve it

replies(1): >>44642872 #

22. lmm ◴[22 Jul 25 03:07 UTC] No.44642872{9}[source]▶

>>44598924 #

> How would that work? You want a nontail keyword to indicate that intent explicitly?

Either that, or to not have TCO applied if I don't set a keyword (which I'd prefer).

> I guess I could imagine a scenario where you want to de-optimize a TCO into a non-TC... but I mean... that's got to be rare enough to just not bother with?

Sometimes I know that a given function will eventually need to be non-TC, in which case I want a way to make it non-TCO now. More commonly a junior team member just hasn't thought about TC-ness at all, in which case I'd rather the function not be TCOed and fail-fast than be TCOed until it isn't.

> EDIT: Also, this is the exact opposite of your comment which I replied to.

No it isn't. It's the detail of how it happens. If you use Scala at scale with a team that includes non-experts it will happen to you.