IRHash: Efficient Multi-Language Compiler Caching by IR-Level Hashing

(www.usenix.org)

32 points matt_d | 1 comments | 05 Sep 25 08:43 UTC | HN request time: 0s | source

Show context

meisel ◴[07 Sep 25 16:02 UTC] No.45159363[source]▶

Very interesting stuff. However, for my day-to-day work, I'm in a large C++ code base where most of the code has to be in headers due to templating. The bottlenecks are, very roughly:

- Header parsing (40% of time)

- Template instantiation (40% of time)

- Backend (20% of time)

For my use case, it seems like this cache would only kick in when 80% of the work has already been done. Ccache, on the other hand, doesn't require any of that work to be done. On a sidenote, template instantiation caching is a very interesting strategy, but today's compilers don't use it (there was some commercially sold compiler a while back that did have it, though).

replies(1): >>45160214 #

aengelke ◴[07 Sep 25 17:25 UTC] No.45160214[source]▶

>>45159363 #

Template instantiation caching is likely to help -- in an unoptimized LLVM build, I found that 40-50% of the compiled code at object file level is discarded at link-time as redundant.

Another thing I'd consider as interesting is parse caching from token to AST. Most headers don't change, so even when a TU needs to be recompiled, most parts of the AST could be reused. (Some kind of more clever and transparent precompiled headers.) This is likely to need some changes in the AST data structures for fast serialization and loading/inserting. And that makes me think that maybe the text book approach of generating an AST is a bad idea if we care about fast compilation.

Tangentially, I'm astonished that they claim correctness while a large amount of IR is inadequately (if at all) captured in the hash (comdat, symbol visibility, aliases, constant exprs, block address, calling convention/attributes for indirect calls, phi nodes, fast math flags, GEP type, ....). I'm also a bit annoyed, because this is the type of research that is very sloppily implemented, only evaluates projects where compile time is not a big problem and then only achieves small absolute savings, and papers over inherent difficulties (here: capturing the IR, parse time) that makes this unlikely to be used in practice.

replies(2): >>45161015 #>>45161129 #

fsfod ◴[07 Sep 25 18:58 UTC] No.45161129[source]▶

>>45160214 #

There was commercial fork of clang zapcc[1] that did caching of headers and template instantiations with an in memory client server system[2], but idk if they solved all the correctness issues or not before abandoning it.

[1] https://github.com/yrnkrn/zapcc

[2] https://lists.llvm.org/pipermail/cfe-dev/2015-May/043155.htm...

replies(1): >>45161299 #

1. meisel ◴[07 Sep 25 19:18 UTC] No.45161299{3}[source]▶

>>45161129 #

Yes, that's the one I was thinking of, thank you

↑