Most active commenters

bogwog(7)
xxpor(5)
klodolph(5)
joshuamorton(4)
sangnoir(4)
nextaccountic(4)
taeric(3)

Popular/hot comments

>>35471805 #
>>35473611 #
>>35474259 #
>>35475214 #
>>35475417 #

←back to thread

Buck2: Our open source build system

(engineering.fb.com)

1. bogwog ◴[06 Apr 23 17:22 UTC] No.35471515[source]▶

>>35470371 (OP) #

I feel so lucky that I found waf[1] a few years ago. It just... solves everything. Build systems are notoriously difficult to get right, but waf is about as close to perfect as you can get. Even when it doesn't do something you need, or it does things in a way that doesn't work for you, the amount of work needed to extend/modify/optimize it to your project's needs is tiny (minus the learning curve ofc, but the core is <10k lines of Python with zero dependencies), and doesn't require you to maintain a fork or anything like that.

The fact that the Buck team felt they had to do a from scratch rewrite to build the features they needed just goes to show how hard it is to design something robust in this area.

If there are any people in the Buck team here, I would be curious to hear if you all happened to evaluate waf before choosing to build Buck? I know FB's scale makes their needs unique, but at least at a surface level, it doesn't seem like Buck offers anything that couldn't have been implemented easily in waf. Adding Starlark, optimizing performance, implementing remote task execution, adding fancy console output, implementing hermetic builds, supporting any language, etc...

[1]: https://waf.io/

replies(7): >>35471805 #>>35471941 #>>35471946 #>>35473733 #>>35474259 #>>35476904 #>>35477210 #

2. xxpor ◴[06 Apr 23 17:45 UTC] No.35471805[source]▶

>>35471515 (TP) #

I truly believe any build system that uses a general-purpose language by default is too powerful. It lets people do silly stuff too easily. Build systems (for projects with a lot of different contributors) should be easy to understand, with few, if any, project specific concepts to learn. There can always be an escape hatch to python (see GN, for example), but 99% of the code should just be boring lists of files to build.

replies(7): >>35471972 #>>35472156 #>>35473592 #>>35473611 #>>35475214 #>>35476355 #>>35476919 #

3. jsgf ◴[06 Apr 23 17:57 UTC] No.35471941[source]▶

>>35471515 (TP) #

I don't know if they considered waf specifically, but the team is definitely very familiar with the state of the art: https://www.microsoft.com/en-us/research/uploads/prod/2018/0...

One of the key requirements is that Buck2 had to be an (almost) drop-in replacement for Buck1 since there's no way we could reasonably rewrite all the millions of existing build rules to accommodate anything else.

Also Buck needs to support aggressive caching, and doing that reliably puts lots of other constraints on the build system (eg deterministic build actions via strong hermeticity) which lots of build systems don't really support. It's not clear to me whether waf does, for example (though if you squint it does look a bit like Buck's rule definitions in Starlark).

4. softfalcon ◴[06 Apr 23 17:57 UTC] No.35471946[source]▶

>>35471515 (TP) #

I could be wrong as I haven't dug into the waf docs too too much, but I think the major difference between waf and Buck is the ability to handle dependency management between various projects in a large org.

The documentation and examples for waf seem to be around building one project, in one language, with an output of statistics and test results. I am sure this is a simplification for education and documentation purposes, but it does leave a vague area around "what if I have more than 1 or 2 build targets + 5 libs + 2 apps + 3 interdependent helper libraries?"

Buck seems to be different in that it does everything waf does but also has clear `dep` files to map dependencies between various libraries within a large repository with many, many different languages and build environments.

The key thing here being, I suspect that within Meta's giant repositories of various projects, they have a tight inter-linking between all these libraries and wanted build tooling that could not only build everything, but be able to map the dependency trees between everything as well.

Pair that with a bunch of consolidated release mapping between the disparate projects and their various links and you have a reason why someone would likely choose Buck over waf purely from a requirements side.

As for another reason they likely chose Buck over waf. It would appear that waf is a capable, but lesser known project in the wider dev community. I say this because when I look into waf, I mostly see it compared against CMake. Its mental state resides mostly in the minds of C++ devs. Either because of NIHS (not invented here syndrome) or fear that the project wouldn't be maintained over time, Meta may have decided to just roll their own tooling. They seem to be really big on the whole "being the SDK of the internet" as of late. I could see them not wanting to support an independent BSD licensed library they don't have complete control over.

These are just my thoughts, I could be completely wrong about everything I've said, but they're my best insights into why they likely didn't consider waf for this.

replies(1): >>35472155 #

5. bogwog ◴[06 Apr 23 17:59 UTC] No.35471972[source]▶

>>35471805 #

I agree, that’s also pretty much why Starlark exists. However, there are many cases where you do need complex build logic.

Personally, I always go for declarative CMake first, then waf as soon as I find my CMakeLists looking like something other than just a list of files.

I’ve considered before creating a simple declarative language to build simple projects like that with waf, but I don’t like the idea of maintaining my own DSL for such little benefit, when CMake works just fine, and everyone knows how to use it. I feel like I’d end up with my own little TempleOS if I decided to go down that rabbit hole.

6. bogwog ◴[06 Apr 23 18:11 UTC] No.35472155[source]▶

>>35471946 #

It’s true that Waf doesn’t come with dependency management out of the box (EDIT: unless you count pkg-config), so maybe that’s why (besides NIHS). The way I handle it is with another excellent project called Conan (https://conan.io/)

However, if you’re going to build a custom package management system anyways, there’s no reason you couldn’t build it on top of waf. Again, the core is tiny enough that one engineer could realistically hold the entire thing in their head.

But I don’t think we’re going to get it right speculating here lol. I’m sure there was more to it than NIHS, or being unaware of waf.

replies(1): >>35473115 #

7. pjmlp ◴[06 Apr 23 18:11 UTC] No.35472156[source]▶

>>35471805 #

They are the bane of any DevOps/Build Engineer when trying to fix build issues.

8. joshuamorton ◴[06 Apr 23 19:21 UTC] No.35473115{3}[source]▶

>>35472155 #

A number of things like being written in python start to matter at big scale. I love python, but cli startup time in python is actually a concern for apps used many times daily by many engineers.

Fixing that or moving to a daemon or whatever starts to take more time than just redoing it from scratch, and if the whole thing is 10k lines of python, it's something a domain expert can mostly reimplement in a week to better serve the fb specific needs.

replies(2): >>35473486 #>>35475753 #

9. bogwog ◴[06 Apr 23 19:51 UTC] No.35473486{4}[source]▶

>>35473115 #

I've been using Waf for a couple of years, including on retro thinkpads from ~08. I've never run into issues with the startup time for waf and/or Python. Even if the interpreter were 100x slower to start and execute than it currently is, that time would be negligible next to the time spent waiting for a compiler or other build task to complete.

And if it is too slow, there's profiling support for tracking down bottlenecks, and many different ways to optimize them. This includes simply optimizing your own code, or changing waf internal behavior to optimize specific scenarios. There's even a tool called "fast_partial" which implements a lot more caching than usual project-wide to reduce time spent executing Python during partial rebuilds in projects with an obscene number of tasks.

> Fixing that or moving to a daemon or whatever starts to take more time than just redoing it from scratch, and if the whole thing is 10k lines of python, it's something a domain expert can mostly reimplement in a week to better serve the fb specific needs.

Well, considering Buck just went through a from-scratch rewrite, I would argue otherwise. Although, to be fair, that 10k count is just for the core waflib. There are extra modules to support compiling C/C++/Java/etc for real projects.

(also, waf does have a daemon tool, but it has external dependencies so it's not included by default)

replies(1): >>35474764 #

10. DrBazza ◴[06 Apr 23 20:02 UTC] No.35473592[source]▶

>>35471805 #

The problem with build systems are its users. For exactly the reason you say. For a man with a hammer every problem is a nail. Developers don’t think of build systems in the right way. If you’re doing something complex in your build it should surely be a build task in its own right.

11. baby ◴[06 Apr 23 20:04 UTC] No.35473611[source]▶

>>35471805 #

I think I would agree as well. So I’m not sure how that makes me feel about nix.

replies(3): >>35474293 #>>35478153 #>>35490145 #

12. PaulDavisThe1st ◴[06 Apr 23 20:13 UTC] No.35473733[source]▶

>>35471515 (TP) #

And the best part about waf? The explicit design intent that you include the build system with the source code. This gets rid of all the problems with build systems becoming backwards/forwards incompatible, and trying to deal with the issues when a developer works on one project using build system v3.9 and another that users build system v4.6

With waf, the build system is trivially included in the source, and so your project always uses the right version of waf for itself.

13. klodolph ◴[06 Apr 23 20:53 UTC] No.35474259[source]▶

>>35471515 (TP) #

> If there are any people in the Buck team here, I would be curious to hear if you all happened to evaluate waf before choosing to build Buck?

There’s no way Waf can handle code bases as large as the ones inside Facebook (Buck) or Google (Bazel). Waf also has some problems with cross-compilation, IIRC. Waf would simply choke.

If you think about the problems you run into with extremely large code bases, then the design decisions behind Buck/Bazel/etc. start to make a lot of sense. Things like how targets are labeled as //package:target, rather than paths like package/target. Package build files are only loaded as needed, so your build files can be extremely broken in one part of the tree, and you can still build anything that doesn’t depend on the broken parts. In large code bases, it is simply not feasible to expect all of your build scripts to work all of the time.

The Python -> Starlark change was made because the build scripts need to be completely hermetic and deterministic. Starlark is reusable outside Bazel/Buck precisely because other projects want that same hermeticity and determinism.

Waf is nice but I really want to emphasize just how damn large the codebases are that Bazel and Buck handle. They are large enough that you cannot load the entire build graph into memory on a single machine—neither Facebook nor Google have the will to load that much RAM into a single server just to run builds or build queries. Some of these design decisions are basically there so that you can load subsets of the build graph and cache parts of the build graph. You want to hit cache as much as possible.

I’ve used Waf and its predecessor SCons, and I’ve also used Buck and Bazel.

replies(3): >>35475404 #>>35475425 #>>35476956 #

14. xxpor ◴[06 Apr 23 20:55 UTC] No.35474293{3}[source]▶

>>35473611 #

Nix is different because no one's smart enough to figure how how to do silly things ;)

15. joshuamorton ◴[06 Apr 23 21:29 UTC] No.35474764{5}[source]▶

>>35473486 #

> Well, considering Buck just went through a from-scratch rewrite, I would argue otherwise

Based on what, the idea that waf fits their needs better than the tool they wrote and somehow wouldn't need to be rewritten or abandoned?

> Even if the interpreter were 100x slower to start and execute than it currently is, that time would be negligible next to the time spent waiting for a compiler or other build task to complete.

This wrongly assumes that clean builds are the only use case. Keep in mind that in many cases when using buck or bazel, a successful build can complete without actually compiling anything, because all of the artifacts are cached externally.

> There's even a tool called "fast_partial" which implements a lot more caching than usual project-wide to reduce time spent executing Python during partial rebuilds in projects with an obscene number of tasks

Right, the point that this is a concern to some people, and that there's clearly some tradeoff here such that it isn't the default immediately rings alarm bells.

replies(2): >>35475287 #>>35478583 #

16. sangnoir ◴[06 Apr 23 22:09 UTC] No.35475214[source]▶

>>35471805 #

You cannot magick away complexity. Large systems (think thousands of teams with hundreds of commits per minute) require a way to express complexity. When all is said and done, you'll have a turing-complete build system anyway - so why not go with something readable

replies(3): >>35475417 #>>35478377 #>>35478538 #

17. bogwog ◴[06 Apr 23 22:17 UTC] No.35475287{6}[source]▶

>>35474764 #

No offense, but I think you're reading too much into my casual comments here to guide your understanding of waf, rather than the actual waf docs. Waf isn't optimized for clean builds (quite the contrary), and neither you nor I know whether the waf defaults are insufficient for whatever Buck is being used for. I just pointed out the existence of that "fast_partial" thing to show how deep into waf internals a project-specific optimization effort could go.

But discussions about optimization are pointless without real world measurements and data.

replies(1): >>35478396 #

18. nextaccountic ◴[06 Apr 23 22:30 UTC] No.35475404[source]▶

>>35474259 #

> They are large enough that you cannot load the entire build graph into memory on a single machine

You mean, multiple gigabytes for build metadata, that just says things like that X depends on Y and to build Y you run command Z?

replies(2): >>35476791 #>>35477370 #

19. xxpor ◴[06 Apr 23 22:31 UTC] No.35475417{3}[source]▶

>>35475214 #

I seriously doubt there's a single repo on the planet that averages hundreds of commits per minute. That's completely unmanageable for any number of reasons.

replies(3): >>35475739 #>>35476653 #>>35477280 #

20. bogwog ◴[06 Apr 23 22:32 UTC] No.35475425[source]▶

>>35474259 #

I get that, but again, there's no reason Waf can't be used as a base for building that. I actually use Waf for cross compilation extensively, and have built some tools around it with Conan for my own projects. Waf can handle cross compilation just fine, but it's up to you to build what that looks like for your project (a common pattern I see is custom Context subclasses for each target)

Memory management, broken build scripts, etc. can all be handled with Waf as well. In the simplest case, you can just wrap a `recurse` call in a try catch block, or you can build something much more sophisticated around how your projects are structured.

Note, I'm not trying to argue that Google/Facebook "should have used X". There are a million reasons to pick X over Y, even if Y is the objectively better choice. Sometimes, molding X to be good enough is more efficient than spending months just researching options hoping you'll find Y.

I'm just curious to know if they did evaluate Waf, why did they decide against it.

replies(2): >>35476812 #>>35476905 #

21. chucknthem ◴[06 Apr 23 23:05 UTC] No.35475739{4}[source]▶

>>35475417 #

It wouldn't surprise me at all if some large repos at Google or Facebook now get to that many, it's easy to do once you have robots committing code (usually configuration changes).

22. cozzyd ◴[06 Apr 23 23:07 UTC] No.35475753{4}[source]▶

>>35473115 #

Just imagine how much memory a large dependency graph would take in Python...

Especially considering how poor Python's support for shared memory concurrency is...

23. bluGill ◴[07 Apr 23 00:12 UTC] No.35476355[source]▶

>>35471805 #

I'd like to agree, but every significant project has something wierd that their build system doesn't have built in yet. So you need some way to extend it. Useful build systems end up supporting lots of hacks '

That said, the more your build systems makes easy without having to write code the better.

replies(1): >>35476446 #

24. joshuamorton ◴[07 Apr 23 00:23 UTC] No.35476446{3}[source]▶

>>35476355 #

While this is true, this isn't a problem for your buck/bazel/pants-likes. Between genrules and custom rules you can do this with all the power you (usually) need.

25. sangnoir ◴[07 Apr 23 00:48 UTC] No.35476653{4}[source]▶

>>35475417 #

I didn't mean on average, but the build tool has to handle the worst case and I probably am understating the worst case.

I'd bet there are a more than a few repos that do get (at least) hundreds of commits as a highwater mark. My guess is lots of engineers + mono-repo + looming code-freeze deadline can do that like clockwork.

Edit: Robots too as sibling pointed out. A single human action may result in dozens of bot-generated commits

replies(1): >>35477303 #

26. klodolph ◴[07 Apr 23 01:08 UTC] No.35476791{3}[source]▶

>>35475404 #

Yes. By “multiple gigabytes” I am talking about >100 GB. Maybe >1 TB.

replies(1): >>35476902 #

27. davnn ◴[07 Apr 23 01:11 UTC] No.35476812{3}[source]▶

>>35475425 #

> the core is <10k lines of Python with zero dependencies

Isn‘t that already a no-go, to write a performance critical system in a slow programming language?

replies(1): >>35476929 #

28. nextaccountic ◴[07 Apr 23 01:24 UTC] No.35476902{4}[source]▶

>>35476791 #

How is this even possible? I take that this data is highly compressible, right?

replies(1): >>35477038 #

29. rtpg ◴[07 Apr 23 01:24 UTC] No.35476904[source]▶

>>35471515 (TP) #

waf looks pretty nice but does it have a remote cache? For me the biggest argument for Bazel is the remote caching, and not having it is a bit of a deal breaker IMO

30. klodolph ◴[07 Apr 23 01:25 UTC] No.35476905{3}[source]▶

>>35475425 #

I don’t see how using Waf as a base would help in any way. It seems like a massive mismatch for the problems that Facebook and Google are solving. You seem to be fond of Waf, maybe if you elaborated why you think that Waf would be a good base for compiling massive, multi-language code-bases, I could understand where you are coming from. Where I am coming from—it feels like Waf is kind of a better version of autotools, or something like that, and it’s just not in the same league. It’s like comparing a bicycle to a cargo ship. Like, “Why didn’t the people designing the cargo ship use the bicycle as a starting point?” I don’t want to abuse analogies here, but that’s what the question sounds like to me. This is based on my relatively limited experience using Waf (and SCons, which I know is different), and my experience using Bazel and Buck.

Having spent a lot of time with Buck and Bazel, there are just so many little things you run into where you go, “Oh, that explains why Buck or Bazel is designed that way.” These design decisions permeate Buck and Bazel (Pants, Please, etc.)

I just don’t see how Waf can be used as a base. I really do see this as a new “generation” of build systems, with Buck, Bazel, Please, and Pants, and everything else seems so much more primitive by comparison.

replies(1): >>35477083 #

31. ◴[07 Apr 23 01:26 UTC] No.35476919[source]▶

>>35471805 #

32. taeric ◴[07 Apr 23 01:28 UTC] No.35476929{4}[source]▶

>>35476812 #

I am no python fan, but find it laughably hard to believe it could be what makes a build coordination system slow.

replies(1): >>35478403 #

33. jsgf ◴[07 Apr 23 01:31 UTC] No.35476956[source]▶

>>35474259 #

With Buck2, memory taken for the graph is a concern, but it fits into a single host's RAM.

replies(1): >>35488520 #

34. phyrex ◴[07 Apr 23 01:44 UTC] No.35477038{5}[source]▶

>>35476902 #

It wouldn’t be compressed in ram though, would it?

replies(1): >>35477121 #

35. bogwog ◴[07 Apr 23 01:53 UTC] No.35477083{4}[source]▶

>>35476905 #

I’m coming from the perspective of someone who has been working with it for a while, and coincidentally very intensely hacking away at it recently.

The thing about waf is that it’s more designed like a framework than a typical build tool. If you look at the code, it’s split into a core library (thats the <10k loc I estimated), and additional tools that do things like add C++ or Java build support.

That’s one of the reasons I like Waf, since it becomes a powerful toolkit for creating a custom build system once you strip away the thin outer layer. There is no one-size-fits-all build system, so a tool that can be molded like waf is very powerful imo.

I guess it’s hard to get that point across without experiencing it. There are just so many good design decisions everywhere. For example, extensibility comes easily because task generator methods are “flat”, and ordering is implemented via constraints. This means you can easily slip your own functions between any built in generator method to manipulate their inputs or outputs. It’s like a sub-build system just for creating Task objects.

Also, I don’t want to give the impression that I think waf would have been a better choice for these companies. I’ve kind of been defending it a lot in this thread, but my original point/question was just to know if they evaluated waf/what they thought about it. After so many comments I feel like I might be coming off as hostile… which isn’t my intention.

replies(1): >>35486158 #

36. nextaccountic ◴[07 Apr 23 02:01 UTC] No.35477121{6}[source]▶

>>35477038 #

There are in-memory https://en.wikipedia.org/wiki/Succinct_data_structure but I actually don't mean that specifically: I mean that, for example, there must be tons of strings with common prefixes, like file paths (which can be stored in a trie to have faster access and compress data in ram) or very similar strings (like compiler invocations that mostly have the same flags), and other highly redundant data that can usually be used to cut down on memory requirements.

I highly doubt that, after doing all those tricks, you still end up with 100GB - 1TB of build data.

replies(1): >>35480290 #

37. scrollaway ◴[07 Apr 23 02:15 UTC] No.35477210[source]▶

>>35471515 (TP) #

Waf bills itself as "the meta build system". But Buck2 is "the Meta build system". :)

38. kps ◴[07 Apr 23 02:26 UTC] No.35477280{4}[source]▶

>>35475417 #

According to [1], in 2015 Google averaged 25 commits per minute (250000/7/24/60). I can imagine hundreds per minute during Pacific working hours today.

[1] https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

replies(1): >>35477320 #

39. xxpor ◴[07 Apr 23 02:31 UTC] No.35477303{5}[source]▶

>>35476653 #

IMO there's almost never a good reason to have automated commits in repos outside of two cases:

1) Automated refactoring

2) Automated merges when CI passes

Configs that can be generated should just be generated by the build.

But that's a different topic

replies(2): >>35477761 #>>35484038 #

40. xxpor ◴[07 Apr 23 02:35 UTC] No.35477320{5}[source]▶

>>35477280 #

In the case of a monorepo, that's exactly the case why the build system shouldn't be overly complex. If you're expecting random people to make changes to your stuff, you shouldn't be burdening them with more complexity than necessary.

The monorepo case is also a little bit outside what I was originally talking about. I was mostly refering to individual services/libraries/apps

41. esprehn ◴[07 Apr 23 02:40 UTC] No.35477370{3}[source]▶

>>35475404 #

Yes, the codebases at Google and FB contain billions of files. Article from 2016 about the scale, and of course it's only grown dramatically since then: https://m-cacm.acm.org/magazines/2016/7/204032-why-google-st...

42. rrdharan ◴[07 Apr 23 03:39 UTC] No.35477761{6}[source]▶

>>35477303 #

There are at least two other hugely important use cases you missed:

- automatic security / vendoring updates (e.g. https://github.com/renovatebot/renovate)

- automated cross-repo syncs, e.g. Google has processes and tools that bidirectionally sync pieces of Google3 with GitHub repos

43. pxc ◴[07 Apr 23 04:50 UTC] No.35478153{3}[source]▶

>>35473611 #

Nix is Turing complete, but it's not a general purpose language. It is designed as a DSL for building software, and I think it's pretty nice for that.

The Nickel rationale doc has some thoughts on why this might be the right call: https://github.com/tweag/nickel/blob/master/RATIONALE.md#tur...

From my (limited) experience with another deliberately limited configuration DSL (CUE), I think more power in such DSLs will pan out better in the long run. Of course, it's not all one or the other: a powerful build DSL can still enforce useful discipline, and a Turing-complete language can still be thoughtfully designed around a special purpose. I think Nix demonstrates both pretty well, actually.

44. lmm ◴[07 Apr 23 05:31 UTC] No.35478377{3}[source]▶

>>35475214 #

On the contrary, large systems have to restrict what their build system does because otherwise the complexity becomes unmanageable. I used to work on a large codebase (~500 committers, ~10MLOC) that had made the decision to use Gradle because they thought they needed it, but then had to add increasingly strict linters/code review/etc. to the gradle build definitions to keep the build maintainable. In the end they had a build that was de facto just as restricted as something like Maven, and the Turing completeness of Gradle did nothing but complicate the build and slow it down.

And sure, maybe having a restricted build definition (whether by using a restricted tool or by doing code review etc.) moves the complexity somewhere else, like into the actual code implementation. But it's easier to manage there. The build system is the wrong place for business logic, because it's not somewhere most programmers ever think to look for it.

45. lmm ◴[07 Apr 23 05:33 UTC] No.35478396{7}[source]▶

>>35475287 #

The fact that it's implemented and not on by default is a red flag any way you slice it. Either it's implemented but unreliable, or it's reliable but the maintainers don't think it's worth turning on for some reason (why?).

46. Too ◴[07 Apr 23 05:35 UTC] No.35478403{5}[source]▶

>>35476929 #

On clean builds the python tax will be dwarfed be the thousands of calls to clang yes. That’s not the scenario you need to optimize for. What’s more important is that incremental builds are snappy, since that is what developers do 100 times per day.

I’ve seen some projects with 100MB+ ninja-files that even ninja itself, proud for being written in optimized c++, takes a second or two to parse on each build invocation. Convert that to python and you likely land in 5-20 sec range instead. Enough to alt-tab and get distracted by something else. Google code base is likely even larger than this.

A background daemon that holds the graph in memory would probably handle it. In the big scheme such a design is likely better anyway. But needs a big upfront design and is a lot more complex than just reparsing a file each time.

Side note: For some, even the interpreter startup is annoying. Personally I find it negligible, especially after 3.11 you can almost claim it’s snappy.

replies(1): >>35481376 #

47. Too ◴[07 Apr 23 05:59 UTC] No.35478538{3}[source]▶

>>35475214 #

No no no no. The more code you have, the more you have to constrain the builds.

I understand where the sentiment comes from, having seen one too many example of people struggling to implement basic logic in cmake or groovy, that would be a oneliner in python. But completely opening up the floodgates is not the right solution.

Escape hatches into GP languages can still exist but the interfaces to them need to be strict, and it’s better people see this boundary clearly, rather than limping around trying to do GP inside cmake and failing on correctness anyway. Everything else should like parent say just be a list of files.

Dependencies need to be declarative and operations hermetic.

Otherwise the spaghetti of complexity will just keep growing. Builds and tests will take forever due to no way of detecting what change affects which subsystem, what can be parallelized and even worse when incremental builds stop working.

By constraining what can be done, you also empower developers to do whatever they want, within said boundaries, without having to go through an expert build-team. Think about containers, it allowed every team to ship whatever they want without consulting the ops team.

replies(1): >>35483938 #

48. dikei ◴[07 Apr 23 06:05 UTC] No.35478583{6}[source]▶

>>35474764 #

Exactly, one of the key selling point Bazel/Buck is their caching systems: very high cache hit rate, with no inconsistency, which allows very fast incremental build: 0-change build takes close to 0 seconds.

49. Shish2k ◴[07 Apr 23 10:56 UTC] No.35480290{7}[source]▶

>>35477121 #

You could do those tricks and cut down memory, perhaps even 10x, but they come at the cost of increased CPU time. Designing the system in such a way that you only ever need to load a tiny subset of the graph at one time gives you a 1000x saving for memory and CPU.

replies(1): >>35481806 #

50. taeric ◴[07 Apr 23 13:29 UTC] No.35481376{6}[source]▶

>>35478403 #

Code bases that big are strawmen for most companies. Yes, they happen; but as often they should be segmented into smaller things. That don't require monolithic build setups.

replies(1): >>35484881 #

51. nextaccountic ◴[07 Apr 23 14:16 UTC] No.35481806{8}[source]▶

>>35480290 #

Some of those tricks may actually decrease CPU time (by fetching less data from RAM and using the CPU cache more effectively). And you can also apply any optimizations for partial loading on top of that.

I guess the downside is that the system would be more complex overall, but you can probably get 80% of the result with not so large changes

52. sangnoir ◴[07 Apr 23 16:57 UTC] No.35483938{4}[source]▶

>>35478538 #

> The more code you have, the more you have to constrain the builds.

That works if you have one team - of if all teams work the same way. If you have multiple teams with conflicting requirements[1], you absolutely should not constrain the build because you'd be getting in the way.

1. E.g. Team A uses an internal C++ lib an online service and prefers an evergreen version of it to be automatically applied with minimal human involvement. Team B team uses the same lib on physical devices shipped to consumers/customers. Updates are infrequent (annual), but have to be tested thoroughly for qualification. Now your build system has to support evergreen dependencies and versioned ones. If you drop support for either, you'll be blocking one team or the other from doing work.

53. sangnoir ◴[07 Apr 23 17:05 UTC] No.35484038{6}[source]▶

>>35477303 #

> IMO there's almost never a good reason to have automated commit

This depends entirely on the quality of dev tools available.

Also, commit =/= shipped code: you may have a automated commits and keep a human in the loop before shipping, by way of rejectable Pull-Request (or the proprietary equivalent).

A simple library upgrade will result in a wave of commits/bot-authored PRs

1. Human makes a change to a core library, changing it from v1 to v2

2. Bot identifies all call-sites and refactors to v2-equivalent, creating 50 PRs for 50 different teams.

One change, 51 commits.

54. joshuamorton ◴[07 Apr 23 18:12 UTC] No.35484881{7}[source]▶

>>35481376 #

The context for this thread was weather Facebook considered waf in participation, so it is very relevant.

replies(1): >>35486237 #

55. klodolph ◴[07 Apr 23 19:47 UTC] No.35486158{5}[source]▶

>>35477083 #

I’m not trying to react to your comments as if they’re hostile, just hope to clear the air. I like defending Buck and Bazel a little bit, and at the same time, I really recognize that they are painful to adopt, don't solve everyone’s problems, etc.

Waf does seem like a “do things as you like” framework, and I think that notion is antithetical to the Buck and Bazel design ethos. Buck and Bazel’s design are, “This is the correct way to do things, other ways are prohibited.” You fit your project into the Buck/Bazel system (which could be a massive headache for some) and in return you get a massive decrease in build times, as well as some other benefits like good cross-compilation support.

One fundamental part of the Buck/Bazel design is that you can load any arbitrary subset of the build graph. Your repository has BUILD files in various directories. Those directories are packages, and you only load the subset of packages that you need for the targets you are actually evaluating during evaluation time. You can even load a child package without loading the parent—like, load //my/cool/package without loading //my/cool or //my.

The build graph construction also looks somewhat different. There is an additional layer. In build systems like Waf, you have some set of configuration options, and the build scripts generate a graph of actions to perform which create the build using that configuration. In Buck/Bazel, there is an additional layer—you create a platform-agnostic build graph first (targets, which are specified using rules like cc_library), and then there’s a second analysis phase, which converts rules like “this is a cc_library” into actual actions like “run GCC on this file”.

These extra layers are there, as far as I can tell, to support the goals of isolating different parts of your build system from each other. If they’re isolated, then you have better confidence that they will produce the same outputs every time, and you can make more of the build process parallelizable—not just the actual build actions, but the act of loading and analyzing build scripts.

I do think that there is room to appreciate both philosophies—the “let’s make a flexible platform” philosophy, and the “let’s make a strict, my-way-or-the-highway build system” philosophy.

56. taeric ◴[07 Apr 23 19:54 UTC] No.35486237{8}[source]▶

>>35484881 #

Certainly fair. I had meandered on to "in general" way too quickly.

57. klodolph ◴[07 Apr 23 23:33 UTC] No.35488520{3}[source]▶

>>35476956 #

Interesting. I know that for Buck 1, some workloads didn’t fit entirely in RAM.

58. tadfisher ◴[08 Apr 23 03:14 UTC] No.35490145{3}[source]▶

>>35473611 #

Nix forces you to serialize every build step (what it calls a "derivation"), and moreover it isolates the build environment to only include things built with Nix or verified by hash. So while there is a lot of power, the only thing you can do with that power is produce derivations which themselves actually run the build.

Contrast this with Gradle, which is currently digging itself out of a hole by forcing authors to declare all inputs and outputs of their tasks so it can serialize them, but you can literally do anything Java can throughout the entire process. This is the kind of Herculean task which is neatly sidestepped by tightly controlling the DSL environment (inputs/outputs) as does Nix.

↑