Looks like it's similar in some ways. But they also don't tell too much and even the self-hosting variant is "Talk to us" pricing :/
Though from what I gather form the story, part of the spedup comes from how android composes their build stages.
I.e. speeding up by not downloading everything only helps if you don't need everything you download. And adds up when you download multiple times.
I'm not sure they can actually provide a speedup in a tight developer cycle with a local git checkout and a good build system.
And as for pricing... are there really that many people working on O(billion) lines of code that can't afford $TalkToUs? I'd reckon that Linux is the biggest source of hobbyist commits and that checks out on my laptop OK (though I'll admit I don't really do much beyond ./configure && make there...)
> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused
> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.
Which sounds way too good to be true.
I was a build engineer in a previous life. Not for Android apps, but some of the low-effort, high-value tricks I used involved:
* Do your building in a tmpfs if you have the spare RAM and your build (or parts of it) can fit there.
* Don't copy around large files if you can use symlinks, hardlinks, or reflinks instead.
* If you don't care about crash resiliency during the build phase (and you normally should not, each build should be done in a brand-new pristine reproducible environment that can be thrown away), save useless I/O via libeatmydata and similar tools.
* Cross-compilers are much faster than emulation for a native compiler, but there is a greater chance of missing some crucial piece of configuration and silently ending up with a broken artifact. Choose wisely.
The high-value high-effort parts are ruthlessly optimizing your build system and caching intermediate build artifacts that rarely change.
I.e. this isn't something battel tested for hundreds of thousands of developers 24/7 over the last years. But a simple commercial product sold by people that liked what they used.
Well, since android is their flagship example, anyone that wants to build custom android releases for some reason. With the way things are, you don't need billions of code of your own code to maybe benefit from tools that handle billions of lines of code.
At the start snapshot the filesystem. Record all files read & written during the step.
Then when this step runs again with the same inputs you can apply the diff from last time.
Some magic to automatically hook into processes and doing this automatically seems possible.
So they could in principle get a full list of dependencies of each build step. Though I'm not sure how they would skip those steps without having an interposer in the build system to shortcut it.
I’d be pretty happy if Git died and it was replacing with a full Sapling implementation. Git is awful so that’d be great. Sigh.
But initially the article sounded like it was describing a mix of tup and Microsoft's git vfs (https://github.com/microsoft/VFSForGit) mushed together. But doing that by itself is probably a pile of work already.
The challenge with those systems is that they’re tightly coupled with the tools, infrastructure, and even developer distros used internally at Google and Meta, which makes them hard to generalize. SourceFS aims to bring that “Piper-like” experience to teams outside Google - but in a way that works with plain Git, Repo, and standard Linux environments.
Also, if I’m not mistaken, neither SrcFS nor EdenFS directly accelerate builds - most of that speed comes from the build systems themselves (Blaze/Buck). SourceFS goes a step further by neatly and simply integrating with the build system and caching/replay pretty much any build step.
The Android example we’ve shown is just one application - it’s a domain we know well and one where the pain is obvious - but we built SourceFS in a way where we can easily integrate with a new build system and speed up other big codebases.
Also you’re spot on that this problem mostly affects big organizations with complex codebases. Here without the infrastructure and SRE support the magic does not work (e.g. think the Redis CVE 10.0 of last week or the AWS downtime of this week) - and hence the “talk to us”.
We plan to gradually share more interesting details about how SourceFS works. If there’s something specific you’d like us to cover - let us know - and help us crowd source our blogpost pipeline :-).
I think it was made by Microsoft; https://github.com/microsoft/p4vfs
The machine running SourceFS was a c4d-standard-16, and if I remember correctly, the results were very similar on an equivalent 8-vCPU setup.
As mentioned in the blog post, the results were 51 seconds for a full Android 16 checkout (repo init + repo sync) and ~15 minutes for a clean build (make) of the same codebase. Note that this run was mostly replay - over 99 % of the build steps were served from cache.
For example, Mercedes’ MB.OS: “is powered by more than 650 million lines of code” - see: https://www.linkedin.com/pulse/behind-scenes-mbos-developmen...
We intentionally kept the blog post light on implementation details - partly to make it accessible to a broader audience, and partly because we will be posting gradually some more details. Sounds like build caching/replay is high on the desired blogpost list - ack :-).
The build-system integration used here was a one-line change in the Android build tree. That said, you’re right - deeper integration with the build system could push the numbers even further, and that’s something we’re actively exploring.
Incremental builds and diff only pulls are not enough in a modern workflow. You either need to keep a fleet of warm builders or you need to store and sync the previous build state to fresh machines.
Games and I'm sure many other types of apps fall into this category of long builds, large assets, and lots of intermediate build files. You don't even need multiple apps in a repo to hit this problem. There's no simple off the shelf solution.
And a fleet of warm builders seems pretty reasonable at that scale.
SourceFS sounds useful for extra smart caching but some of these problems do sound like they're just bad fixable configuration.
It's actually pretty hard. The more builders you have the older the workspace gets and scaling up or cycling machines causes the next builds to be super slow. Game engines end up making central intermediate asset caches like Unreal's UBA or Unity's Cache Server.
For example, the League of Legends source repo is millions of files and hundreds of GB in size, because it includes things like game assets, vendored compiler toolchains for all of our target platforms, etc. But to compile the game code, only about 15,000 files and 600MB of data are needed from the repo.
That means 99% of the repo is not needed at all for building the code, and that is why we are seeing a lot of success using VFS-based tech like the one described in this blog. In this case, we built our own virtual filesystem for source code based on our existing content-defined patching tech (which we wrote about a while ago [1]). It's similar to Meta's EdenFS in that we built it on top of the ProjFS API on Windows and NFSv3 on macOS and Linux. We can mount a view into the multimillion-file repo in 3 seconds, and file data (which is compressed and deduplicated and served through a CDN) is downloaded transparently when a process requests it. We use a normal caching build system to actually run the build, in our case FASTBuild.
I recently timed it, and I can go from having nothing at all on disk to having locally built versions of the League of Legends game client and server in 20 seconds on a 32-core machine. This is with 100% cache hits, similar to the build timings mentioned in the article.
[1] https://technology.riotgames.com/news/supercharging-data-del...
[1] https://help.perforce.com/helix-core/server-apps/p4vfs/curre...
Objection! Long build times are better for sword-fighting time. The longer it takes, the more sword-fighting we have time for!
Device drivers would exist in kernel sources, not the AOSP tree.
I’m actually disappointed this type of thing never caught on, it’s fairly easy on Linux to track every file a program accesses, so why do I need to write dependency lists?