C4 – C in 4 functions

(github.com)

517 points petercooper | 3 comments | 04 Nov 14 20:19 UTC | HN request time: 0.681s | source

Show context

userbinator ◴[04 Nov 14 23:30 UTC] No.8559699[source]▶

From a cursory glance this appears to be a much-condensed recursive-descent parser with all of the usual parsing functions moved into one, so it's not all that difficult to understand. I think recursive-descent is one of the easiest to intuitively understand parsing algorithms, and it also makes for some concise code; it's also far easier to debug a recursive-descent parser than one of the traditional table-driven ones.

Edit: OTCC (http://www.bellard.org/otcc/ ) is another extremely tiny (to the point of being obfuscated) example of a compiler using a recursive-descent parser.

replies(1): >>8559992 #

1. barrkel ◴[05 Nov 14 01:00 UTC] No.8559992[source]▶

>>8559699 #

It's an operator precedence parser, which is actually quite a bit faster than recursive descent for expressions when operator precedence is defined using grammar productions that would otherwise get turned into methods.

For example, a simplified expression operator precedence can be expressed like this (where {} denotes repetition, implemented with e.g. a while loop in a recursive descent parser (RDP)):

    expr ::= term { '+' term | '-' term } ;
    term ::= factor { '*' factor | '/' factor } ;
    factor ::= '-' factor | <number> | '(' expr ')' ;

Typically this would be parsed in RDP using expr(), term() and factor() functions, where expr() calls term(), term() calls factor(), and factor() calls expr() if it sees an opening parenthesis.

The trouble is that this means that every single expression parse needs to call through this deep code flow before it can see a single interesting bit of the expression being parsed. Parsing "2 + 3" will result in 5 function calls: expr, term, factor, then term and factor again.

The operator precedence parser presented avoids this deeply nested set of calls before it starts parsing interesting things.

This problem is far more acute in a language like C, which has lots of precedence levels, than it is in a language like Pascal, which only has a handful.

replies(1): >>8560452 #

2. userbinator ◴[05 Nov 14 03:54 UTC] No.8560452[source]▶

>>8559992 (TP) #

I was contrasting it against table-driven parsers (e.g. typical of the Yacc/Bison type) which I believe don't have any recursive calls and use a separate stack, while this one does make recursive calls. In some ways operator precedence/precedence climbing is like an optimisation of RDP, replacing a set of recursive functions with a level counter and a loop.

replies(1): >>8561126 #

3. barrkel ◴[05 Nov 14 09:55 UTC] No.8561126[source]▶

>>8560452 #

I can view using the CPU stack instead of an explicit stack, and the CPU instruction pointer instead of an index into a state table, as optimizations too (direct encoding). There also exist recursive ascent parsers, which are the RDP analogue of table-driven LALR parsers you're talking about.

Point being, there are a lot of different parsing approaches, with pros and cons, and it can be useful to treat them separately rather than view them within only a couple of lenses.

↑