http://www.amazon.com/Compilers-Principles-Techniques-Tools-...
I highly recommend it, but it's heavy stuff. There are probably simpler guides out there that just cover the basics.
http://www.amazon.com/Compilers-Principles-Techniques-Tools-...
I highly recommend it, but it's heavy stuff. There are probably simpler guides out there that just cover the basics.
http://compilers.iecc.com/crenshaw/tutor8.txt
The main point for our discussion is last:
> Desire for Generality
> We have been concentrating on the use of a recursive-descent parser to parse a deterministic grammar, i.e., a grammar that is not ambiguous and, therefore, can be parsed with one level of lookahead.
> In practice, these issues turn out to be considerably less important. Modern languages tend to be designed to be easy to parse, anyway. That was a key motivation in the design of Pascal. Sure, there are pathological grammars that you would be hard pressed to write unambiguous BNF for, but in the real world the best answer is probably to avoid those grammars!
Here's a bit more of what he said, heavily snipped:
> Limited RAM Forcing Multiple Passes
> All the early compiler writers had to deal with this issue: Break the compiler up into enough parts so that it will fit in memory. When you have multiple passes, you need to add data structures to support the information that each pass leaves behind for the next. That adds complexity, and ends up driving the design.
> Batch Processing
> In a mainframe compiler as well as many micro compilers, considerable effort is expended on error recovery ... it can consume as much as 30-40% of the compiler and completely drive the design. The idea is to avoid halting on the first error, but rather to keep going at all costs, so that you can tell the programmer about as many errors in the whole program as possible.
> Large Programs
[Basically comes down to "This is 1980s Micro Land. We don't have mmap or virtual memory in general. The simple way is to keep everything in RAM and encourage small subroutines."]
> Emphasis on Efficiency
[This is an interesting point. He says that we have fast enough CPUs that compilers can emit sub-optimal code and it doesn't matter.]
> Limited Instruction Sets
[Eh. I don't agree that limited instruction sets make compilers more complicated. Even in his design, code generation was a rather trivial part of the entire program.]
Main link:
Modern code generation has moved on a bit, so I wouldn't dig too deeply into the latter third or so of the book.
All in all, for a hobby compiler, it would be a poor choice; heavy on unnecessary theory in the front end and outdated on on back end stuff.
Let's Build a Compiler by Jack Crenshaw is one of my favourite guides for the hobbyist, although it's a bit dated now since it used Turbo Pascal and targeted 16-bit x86.
That said, I typically recommend compilers get written in an ML or LISP given it's so much easier to do in those languages. Ocaml and Racket are my main recommendations for modern work. One can also use techniques for deriving imperative code from functional programs if one wants to re-implement that compiler in specific imperative language on a machine. The constructs are even straight-forward enough for macro assembly for those that want to understand it down to bare metal.
Grune at al., "Modern Compiler Design",
Appel, "Modern Compiler Implementation in ML",
and many more.
P.S. Because of a slow-ban, adding another one to this answer. This is the best source on SSA:
http://www.cri.ensmp.fr/people/pop/papers/2006-12-thesis.pdf
There is an old but good book, often overlooked: "Functional Programming" by Anthony J. Field and Peter G. Harrison (1988). Despite the title, it's more about various compilation techniques.
Also an old but still relevant (it is missing the STG, but otherwise full of interesting stuff): http://research.microsoft.com/en-us/um/people/simonpj/papers...
It also worth following this blog: https://wingolog.org/
[1] http://www.amazon.ca/Crafting-Compiler-Charles-N-Fischer/dp/...
Far as safet, you can build checks into your code for protection or use a typed subset of LISP (eg Typed Racket). Shen goes further by embedding whole sequent calculus for custom, per app/module, type systems.
So, not saying LISP is ideal for compilers but does have some ideal and good attributes. Ocaml is my default given it combines conciseness, correctness, decent performance, ease of learning, and some helpful libs.
But if a person is merely looking to bang out a compiler without getting overwhelmed with how to convert NFAs to DFAs for lexing, etc., some good alternative books are:
A Retargetable C Compiler: Design and Implementation, by Hanson and Fraser (http://www.amazon.com/Retargetable-Compiler-Design-Implement...). This book constructs and documents the explains the code for a full C compiler with a recursive descent approach (no flex/lex or bison/yacc). I have some experience augmenting this compiler, so I can vouch for the book's ability to clearly convey their design.
Compiler Design in C, by Allen Holub (http://www.holub.com/software/compiler.design.in.c.html). Downloadable PDF at that link as well. A book from 1990 in which Holub constructs his own version of lex and yacc, and then builds a subset-C compiler which generates intermediate code.
Practical Compiler Construction, by Nils Holm (http://www.lulu.com/shop/nils-m-holm/practical-compiler-cons...). A recent book which documents the construction of a SubC (subset of C) compiler and generates x86 code on the back end.
The other two are new to me. Thanks for those.
The lecture notes are simple and excellent, though at a very high level of abstraction. Though most of the notes are full of references to the original material, which is super nice.
See a relevant discussion here from not long ago:
As for parsing, as I already said elsewhere in this thread, all the techniques from Dragon Book are not practical any more and are not used in the modern compilers. There are far better ways, which are not covered in the book, and they're far simpler, not even deserving a dedicated book at all.
Although, to be fair, PEG (Packrat) do support left recursion [1], and it can also be combined with Pratt for the binary expressions very efficiently, at O(n).
Unfortunately it sometimes generates the wrong parse: http://tratt.net/laurie/research/pubs/papers/tratt__direct_l...
Left recursive PEG must be reserved for things like method, structure field and array access, which are ok with that algorithm.
Meanwhile, you can take a look a my implementation of such a mix: https://github.com/combinatorylogic/mbase/tree/master/src/l/...
An example of a grammar built on top of it: https://github.com/combinatorylogic/clike/blob/master/clike/... - see the `binary` nodes there, they're translated into Pratt while all the others are Packrat.
Working from an example like LLVM or reading Appel or Wirth's books are better.
So that's static typing (Typed Racket) and macros for sure. Possibly ADT's depending on whether you object to claim above. Should cover your list of requirements.