Most active commenters
  • tiu(3)
  • tester756(3)

←back to thread

143 points todsacerdoti | 11 comments | | HN request time: 0.239s | source | bottom
1. tiu ◴[] No.43594898[source]
I just got over writing 2 (well 2.5) toy compilers and I think a lot of the material in the compiler-teaching space lack some subtle developmental aspects.

I wish there was a course designed somewhere which talked about more ingrained issues: how to structure/design the AST[0], buffer based vs noncontextual tokenization/parser design, index handling and error sync in the parser, supporting multiple codegen architectures, handling FFI, exposing the compiler as an API for external tooling, namespaces and linkage handling etc. etc. etc.

It is refreshing to see how Carbon designed some of its components (majorly the frontend, yet to take a look at the backend) as it touches on some of the subtleties I mentioned. If someone is starting out on writing one, I would recommend taking a look at it or any of the talks.

Always nice to see new material coming up. A few resources that I would like to mention would be dabaez's compiler course, Khoury college's compiler course (in Rust, previously i think and Ocaml), Nora Sandler's book as well as http://compilerbook.org; Which I consider to be the best guide out there for writing small learning compilers, the videos are good as well.

[0]: Some related content that I enjoyed reading: https://lesleylai.info/en/ast-in-cpp-part-1-variant/

replies(3): >>43595603 #>>43595652 #>>43602829 #
2. seanmcdirmid ◴[] No.43595603[source]
You can learn all that stuff by doing and looking at other people’s design, it’s a bit niche to package that up into a ln advanced language design/compiler class, everyone is making different trade offs so are focusing on different things. There also isn’t really a market for more than a few highly trained language designers and implementers, and most wind up doing different things than what they were trained for.
3. tester756 ◴[] No.43595652[source]
>supporting multiple codegen architectures

Shouldnt it be a result of modular software design?

>exposing the compiler as an API for external tooling

Isn't it just generating libraries instead of executables?

replies(2): >>43595906 #>>43596118 #
4. remexre ◴[] No.43595906[source]
Making the _compiler itself_ provide an API enables things like LSP, which don't want to generate machine code at all. A traditional single-pass compiler usually can't accommodate this without re-plumbing.
replies(2): >>43596467 #>>43605124 #
5. tiu ◴[] No.43596118[source]
I wrote 'multiple codegen architectures' instead of 'multiple architectures for codegen'.

As far as I have done in the toy compilers and seen the things in actual production ready compilers, the codegen is still very much tied to the one thing or the other rest llvm.

replies(2): >>43599433 #>>43604777 #
6. halffullbrain ◴[] No.43596467{3}[source]
The Eclipse Compiler for Java [1] is a notable exception, architected around incremental compilation, an API for “live” AST manipulation, and a layered non-batch approach to when to invoke various analysis steps.

The LSP for Java [2] used in eg. VSCode’s Java plugins, builds on this API.

But, no, I haven’t seen a generalized approach to this architecture discussed in literature.

1: https://github.com/eclipse-jdt/eclipse.jdt.core 2: https://github.com/eclipse-jdtls/eclipse.jdt.ls

7. elcritch ◴[] No.43599433{3}[source]
Nim has multiple backends and is relatively mature. It’s fairly readable as compilers go.

There’s also a new experimental rewrite of the Nim compiler called Nimony which targets a new intermediate called NIFC. That is intended to the be transformed to C, LLVM, JavaScript, etc.

8. yencabulator ◴[] No.43602829[source]
The compilerbook link is broken in multiple ways, but www.compilerbook.com redirects to https://www3.nd.edu/~dthain/compilerbook/
replies(1): >>43603044 #
9. tiu ◴[] No.43603044[source]
Yes, thank you. That is the Douglas Thain book I meant. (compilerbook.org seems to work for me, only broken thing would be the semicolon).
10. tester756 ◴[] No.43604777{3}[source]
>I wrote 'multiple codegen architectures' instead of 'multiple architectures for codegen'.

You want to generate e.g x86 + ARM + RISCV, yea?, and shouldnt it be a result of modular architecture?

like your various codegens just take your AST and generate output

11. tester756 ◴[] No.43605124{3}[source]
I'm aware, I've used C#/.NET Compiler ecosystem and their Compiler as a Service approach

But it seems like you're just creating libs that everyone and embedd / use and that's it

e.g `auto ast = Compiler.GenerateAST(code);`