←back to thread

517 points petercooper | 1 comments | | HN request time: 1.374s | source
Show context
TazeTSchnitzel ◴[] No.8559029[source]
Pretty damn impressive.

Though I must wonder: how complete is it? What does it and does it not support? It's at least complete enough to be self-hosting, but beyond that? The code doesn't use that much of C.

replies(2): >>8559058 #>>8559068 #
cbhl ◴[] No.8559058[source]
Judging from the comments in c4.cpp, it probably only supports enough of a subset to compile itself.

Granted, while building a parser that can parse (let alone compiling) the full C language is nontrivial, any undergrad should be able to build a parser and compiler for a sufficiently simple subset of it. (In my undergrad, we used this subset to build a "compiler" in second year: https://www.student.cs.uwaterloo.ca/~cs241/wlp4/WLP4.html)

replies(2): >>8559064 #>>8560966 #
JoeAltmaier ◴[] No.8559064[source]
You can build a C parser in an afternoon. It only has a few language constructs. Declarations are the hardest. Scanners are readily available for expressions and constants.
replies(2): >>8559077 #>>8560875 #
cbhl ◴[] No.8559077[source]
I would be worried about handling the not-context-free parts of the language: http://trevorjim.com/c-and-cplusplus-are-not-context-free/

How would you handle these?

replies(1): >>8559095 #
JoeAltmaier ◴[] No.8559095[source]
Scanner has to have access to the symbol table, emit 'ident' or 'type' as appropriate. So not a pure context-free parser.
replies(1): >>8559385 #
sedachv ◴[] No.8559385[source]
Yes. This is the reason it is way easier to hand-build a parser for C than try to use any kind of parser generator.
replies(1): >>8559433 #
JoeAltmaier ◴[] No.8559433[source]
Bison and Flex work fine - do they count as 'parser generators'?
replies(1): >>8559824 #
sedachv ◴[] No.8559824[source]
Flex does not work fine, that's why you need the lexer hack: https://en.wikipedia.org/wiki/The_lexer_hack

Non-working (no lexer hack) Flex and Bison specification for C99: https://gist.github.com/codebrainz/2933703 653 lines

Hand-written recursive-descent C parser and pre-processor (something not handled by Lex/Yacc) that produces executable Common Lisp code: https://github.com/vsedach/Vacietis/blob/master/compiler/rea... 935 lines

replies(1): >>8559930 #
JoeAltmaier ◴[] No.8559930[source]
You can put whatever code you want into the Flex phrase rules. The 'lexer hack' can be implemented there. That's what I do whenever I have to parse something like C.
replies(1): >>8560693 #
sedachv ◴[] No.8560693[source]
Exactly. You have to drop hooks into the lexer from the parser, and by the time you're done you end up with just as much code. Only it's slower than a recursive-descent parser would be, and a lot of the time parsing speed really matters because it shows up as user-visible latency.
replies(1): >>8561086 #
vidarh ◴[] No.8561086[source]
While I have a strong dislike for lex/yacc and descendants, the "hooks" you need for C are trivial. As far as I remember the only thing you need is an ability for the lexer to check whether or not a given identifier is a variable or type.

Even that is only needed if you want to report more specific information up to the parser. E.g. Clang doesn't. In Clang it is instead the parser that looks up the information in order to figure out what type of identifier it has received.

replies(2): >>8561477 #>>8563641 #
1. JoeAltmaier ◴[] No.8563641[source]
And what are the alternatives? I've always wanted a parser object (in C++ anyway) that I can add constructions to, feed it scanner output and have it build a symbol table and semantic tree. Does such a thing exist?