1) modules only really help address time spent parsing stuff, not time spent doing codegen. Actually they can negatively impact codegen performance because they can make more definitions available for inlining/global opts, even in non-lto builds. For this reason it's likely best to compare using thin-lto in both cases.
2) when your dependencies aren't yet modularized you tend to get pretty big global module fragments, inflating both the size of your BMIs and the parsing time. Header units are supposed to partially address this but right now they are not supported in any build systems properly (except perhaps msbuild?). Also clang is pretty bad at pruning the global module fragment of unused data, which makes this worse again.