←back to thread

392 points mfiguiere | 5 comments | | HN request time: 1.138s | source
Show context
bogwog ◴[] No.35471515[source]
I feel so lucky that I found waf[1] a few years ago. It just... solves everything. Build systems are notoriously difficult to get right, but waf is about as close to perfect as you can get. Even when it doesn't do something you need, or it does things in a way that doesn't work for you, the amount of work needed to extend/modify/optimize it to your project's needs is tiny (minus the learning curve ofc, but the core is <10k lines of Python with zero dependencies), and doesn't require you to maintain a fork or anything like that.

The fact that the Buck team felt they had to do a from scratch rewrite to build the features they needed just goes to show how hard it is to design something robust in this area.

If there are any people in the Buck team here, I would be curious to hear if you all happened to evaluate waf before choosing to build Buck? I know FB's scale makes their needs unique, but at least at a surface level, it doesn't seem like Buck offers anything that couldn't have been implemented easily in waf. Adding Starlark, optimizing performance, implementing remote task execution, adding fancy console output, implementing hermetic builds, supporting any language, etc...

[1]: https://waf.io/

replies(7): >>35471805 #>>35471941 #>>35471946 #>>35473733 #>>35474259 #>>35476904 #>>35477210 #
klodolph ◴[] No.35474259[source]
> If there are any people in the Buck team here, I would be curious to hear if you all happened to evaluate waf before choosing to build Buck?

There’s no way Waf can handle code bases as large as the ones inside Facebook (Buck) or Google (Bazel). Waf also has some problems with cross-compilation, IIRC. Waf would simply choke.

If you think about the problems you run into with extremely large code bases, then the design decisions behind Buck/Bazel/etc. start to make a lot of sense. Things like how targets are labeled as //package:target, rather than paths like package/target. Package build files are only loaded as needed, so your build files can be extremely broken in one part of the tree, and you can still build anything that doesn’t depend on the broken parts. In large code bases, it is simply not feasible to expect all of your build scripts to work all of the time.

The Python -> Starlark change was made because the build scripts need to be completely hermetic and deterministic. Starlark is reusable outside Bazel/Buck precisely because other projects want that same hermeticity and determinism.

Waf is nice but I really want to emphasize just how damn large the codebases are that Bazel and Buck handle. They are large enough that you cannot load the entire build graph into memory on a single machine—neither Facebook nor Google have the will to load that much RAM into a single server just to run builds or build queries. Some of these design decisions are basically there so that you can load subsets of the build graph and cache parts of the build graph. You want to hit cache as much as possible.

I’ve used Waf and its predecessor SCons, and I’ve also used Buck and Bazel.

replies(3): >>35475404 #>>35475425 #>>35476956 #
nextaccountic ◴[] No.35475404[source]
> They are large enough that you cannot load the entire build graph into memory on a single machine

You mean, multiple gigabytes for build metadata, that just says things like that X depends on Y and to build Y you run command Z?

replies(2): >>35476791 #>>35477370 #
klodolph ◴[] No.35476791[source]
Yes. By “multiple gigabytes” I am talking about >100 GB. Maybe >1 TB.
replies(1): >>35476902 #
1. nextaccountic ◴[] No.35476902[source]
How is this even possible? I take that this data is highly compressible, right?
replies(1): >>35477038 #
2. phyrex ◴[] No.35477038[source]
It wouldn’t be compressed in ram though, would it?
replies(1): >>35477121 #
3. nextaccountic ◴[] No.35477121[source]
There are in-memory https://en.wikipedia.org/wiki/Succinct_data_structure but I actually don't mean that specifically: I mean that, for example, there must be tons of strings with common prefixes, like file paths (which can be stored in a trie to have faster access and compress data in ram) or very similar strings (like compiler invocations that mostly have the same flags), and other highly redundant data that can usually be used to cut down on memory requirements.

I highly doubt that, after doing all those tricks, you still end up with 100GB - 1TB of build data.

replies(1): >>35480290 #
4. Shish2k ◴[] No.35480290{3}[source]
You could do those tricks and cut down memory, perhaps even 10x, but they come at the cost of increased CPU time. Designing the system in such a way that you only ever need to load a tiny subset of the graph at one time gives you a 1000x saving for memory and CPU.
replies(1): >>35481806 #
5. nextaccountic ◴[] No.35481806{4}[source]
Some of those tricks may actually decrease CPU time (by fetching less data from RAM and using the CPU cache more effectively). And you can also apply any optimizations for partial loading on top of that.

I guess the downside is that the system would be more complex overall, but you can probably get 80% of the result with not so large changes