Converting a large mathematical software package written in C++ to C++20 modules

1. Asooka ◴[01 Jul 25 17:45 UTC] No.44436332[source]▶

I would like to see a comparison between modules and precompiled headers. I have a suspicion that using precompiled headers could provide the same build time gains with much less work.

replies(4): >>44436478 #>>44436777 #>>44436830 #>>44439283 #

2. pjmlp ◴[01 Jul 25 17:58 UTC] No.44436478[source]▶

>>44436332 (TP) #

As per Office team, modules are much faster, especially if you also make use of C++ standard library as module, available since C++23.

See VC++ devblogs and CppCon/C++Now talks from the team.

Pre-compiled headers have only worked well on Windows, and OS/2 back in the day.

For whatever reason UNIX compilers never had a great implementation of it.

With exception of clang header maps, which is anyway one of the first approaches to C++ modules.

replies(1): >>44438312 #

3. dataflow ◴[01 Jul 25 18:31 UTC] No.44436777[source]▶

>>44436332 (TP) #

Precompiled headers are generally better for system/3rd-party headers. Module are better than PCHs for headers you own, although in some cases you may be better off not using them at all. (I say these because the benefit depends on the frequency with which you need to recompile them, and the relative coupling etc.) Depending on how heavy each one is in your codebase, and how often you modify global build settings, you may have a different experience. And neither is a substitute for keeping headers lightweight and decoupled.

4. w4rh4wk5 ◴[01 Jul 25 18:35 UTC] No.44436830[source]▶

>>44436332 (TP) #

From my experience, compile times ain't an issue if you pay a little attention. Precompiled header, thoughtful forward declarations, and not abusing templates get you a long way.

We are commonly working with games that come with a custom engine and tooling. Compiling everything from scratch (around 1M lines of modern C++ code) takes about 30-40 seconds on my desktop. Rebuilding 1 source file + linking comes in typically under 2 seconds (w/o LTO). We might get this even lower by introducing unity builds, but there's no need for that right now.

replies(1): >>44436956 #

5. ttoinou ◴[01 Jul 25 18:51 UTC] No.44436956[source]▶

>>44436830 #

40 seconds for 1M lines seems super fast, do you have a fast computer and/or did you spend a lot of time optimizing the compilation pipeline ?

replies(3): >>44437056 #>>44437789 #>>44441379 #

6. vblanco ◴[01 Jul 25 19:05 UTC] No.44437056{3}[source]▶

>>44436956 #

The modern cryengine compiles very fast. Their trick is that they have architected everything to go through interfaces that are on very thin headers, and thus their headers end very light and they dont compile the class properties over and over. But its a shame we need to do tricks like this for compile speed as they harm runtime performance.

replies(1): >>44437223 #

7. ttoinou ◴[01 Jul 25 19:26 UTC] No.44437223{4}[source]▶

>>44437056 #

Why does it ruin runtime performance ? The code should be almost the same

replies(2): >>44437281 #>>44439322 #

8. vblanco ◴[01 Jul 25 19:34 UTC] No.44437281{5}[source]▶

>>44437223 #

Because you now need to go through virtual calls on functions that dont really need to be virtual, which means the possible cache miss from loading the virtual function from vtable, and then the impossibility of them being inlined. For example they have a ITexture interface with a function like virtual GetSize(). If it wasnt all through virtuals, that size would just be a vec2 in the class and then its a simple load that gets inlined.

replies(2): >>44437305 #>>44441425 #

9. ttoinou ◴[01 Jul 25 19:37 UTC] No.44437305{6}[source]▶

>>44437281 #

Ah yes this kind of interface ok indeed this doesn't seem like a useful layer when running the program. Maybe the compilers could optimize this though

replies(2): >>44439278 #>>44441694 #

10. w4rh4wk5 ◴[01 Jul 25 20:36 UTC] No.44437789{3}[source]▶

>>44436956 #

We didn't create this code base ourselves, we are just working with it. I'd assume the original developers payed attention to compile times during development and introduced forward declarations whenever things got out of hand.

My computer is fast, AMD Ryzen 9 7950X, code is stored on an NVMe SSD. But there certainly are projects with fewer lines of code that take substantially longer to compile.

11. fpoling ◴[01 Jul 25 21:51 UTC] No.44438312[source]▶

>>44436478 #

This has been puzzling me for over 3 decades. My first experience with C++ was Borland C++ for DOS. It had precompiled headers and it worked extremely well.

Then around 1995 I got access to HP-UX and native compiler there and GCC. Nobody heard about precompiled headers and people thought the only way to speed up compilation was to get access to computer with more CPUs and rely on make -j.

And then there was no interest to implement precompiled headers from free and proprietary vendors.

The only innovation was unity builds when one includes multiple C++ sources into super-source. But then Google killed support for it in Chromium claiming that with their build farm unity builds made things slower and supporting them in Chromium build system was unbearable burden for Google.

replies(1): >>44439308 #

12. jeremiahar ◴[02 Jul 25 00:45 UTC] No.44439278{7}[source]▶

>>44437305 #

In my experience, as long as there's only a single implementation, devirtualization works well, and can even inline the functions. But you need to pass something along the lines of "-fwhole-program-vtables -fstrict-vtable-pointer" + LTO. Of course the vtable pointer is still present in the object. So I personally only use the aforementioned "thin headers" at a system level (IRenderer), rather than for each individual object (ITexture).

13. barchar ◴[02 Jul 25 00:47 UTC] No.44439283[source]▶

>>44436332 (TP) #

So, clang's modules are quite similar to clangs precompiled headers, especially the "chained" pchs. With PCH you have to wait on the serial PCH compilation step before you can get any parallelism, with modules you can compile each part of the "PCH" in parallel and anything using some subset of your dependencies can get started without waiting on things it doesn't use.

Header units are basically chained PCHs. Sadly they are hard to build correctly at the moment.

14. barchar ◴[02 Jul 25 00:53 UTC] No.44439308{3}[source]▶

>>44438312 #

Fwiw doing a unity build with thin-lto can yield lovely results. That way you still get parallel _and_ incremental codegen.

replies(1): >>44442047 #

15. barchar ◴[02 Jul 25 00:56 UTC] No.44439322{5}[source]▶

>>44437223 #

In addition to what everyone else has said it also makes it difficult to allocate the type on the stack. Even if you do allow it you'll at least need a probe.

16. ◴[02 Jul 25 08:40 UTC] No.44441379{3}[source]▶

>>44436956 #

17. pjmlp ◴[02 Jul 25 08:48 UTC] No.44441425{6}[source]▶

>>44437281 #

At least on clang with LTO, with bitcode variant, that should be possible to devirtualize, assuming most of those interfaces only have a single implementation.

18. drysine ◴[02 Jul 25 09:33 UTC] No.44441694{7}[source]▶

>>44437305 #

They can sometimes

https://quuxplusone.github.io/blog/2021/02/15/devirtualizati...

19. nh2 ◴[02 Jul 25 10:35 UTC] No.44442047{4}[source]▶

>>44439308 #

Do you have some examples?

I cannot find any reports of the speedups people get with the combination of Jumbo/Unity builds and ThinLTO.