Most active commenters

    ←back to thread

    137 points cdesai | 11 comments | | HN request time: 1.283s | source | bottom
    Show context
    Ericson2314 ◴[] No.45670912[source]
    The headline times are a bit ridiculous. Are they trying to turn https://github.com/facebook/sapling/blob/main/eden/fs/docs/O... or some git fuse thing into a product?
    replies(2): >>45671058 #>>45671060 #
    1. zokier ◴[] No.45671060[source]
    Well they also claim to be able to cache build steps somehow build-system independently.

    > As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused

    > SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.

    Which sounds way too good to be true.

    replies(4): >>45671827 #>>45671987 #>>45672236 #>>45677349 #
    2. fukka42 ◴[] No.45671827[source]
    Seems viable if you can wrap each build stap with a start/stop signal.

    At the start snapshot the filesystem. Record all files read & written during the step.

    Then when this step runs again with the same inputs you can apply the diff from last time.

    Some magic to automatically hook into processes and doing this automatically seems possible.

    replies(1): >>45674101 #
    3. vlovich123 ◴[] No.45671987[source]
    Yeah, I agree. This part is hand waved away without any technical description of how they manage to pull this off since knowing what is even a build step and what dependencies and outputs are are only possible at the process level (to disambiguate multi threaded builds). And then there’s build steps that have side effects which come up a lot with CMake+ninja.
    replies(1): >>45673038 #
    4. MangoToupe ◴[] No.45672236[source]
    You could manage this with a deterministic vm, cf antithesis.
    5. rcxdude ◴[] No.45673038[source]
    A fuse filesystem can get information about the thread performing the file access: https://man.openbsd.org/fuse_get_context.3

    So they could in principle get a full list of dependencies of each build step. Though I'm not sure how they would skip those steps without having an interposer in the build system to shortcut it.

    replies(3): >>45674200 #>>45674223 #>>45674474 #
    6. bananaquant ◴[] No.45674101[source]
    I think I got the magic part. You can store all build system binaries in the VFS itself. When any binary gets executed, VFS can return a small sham binary instead that just checks command line arguments, if they match, checks the inputs, and if they match, applies the previous output. If there is any mismatch, it can execute the original binary as usual and make the new output. Easy and no process hacking necessary.
    7. vlovich123 ◴[] No.45674200{3}[source]
    Yeah that’s what I meant. I bet you the build must be invoked through a wrapper script that interposes all executables launched within the product tree. Complicated but I think it could work. Skipping steps correctly is the hard part but maybe you do that in terms of knowing somehow the files that will be accessed ahead of time by that processes and then skipping the launch and materializing the output (they also mention they have to run it once in a sandbox to detect the dependencies). But still, side effects in build systems seem difficult to account for correctly; I bet you that’s why it’s a “contact us” kind of product - there’s work needed to make sure it actually works on your project.
    8. mook ◴[] No.45674223{3}[source]
    Didn't tup do something like that? https://gittup.org/tup/index.html Haven't looked at it in a while, no idea if it got adoption.

    But initially the article sounded like it was describing a mix of tup and Microsoft's git vfs (https://github.com/microsoft/VFSForGit) mushed together. But doing that by itself is probably a pile of work already.

    replies(1): >>45675075 #
    9. ◴[] No.45674474{3}[source]
    10. serbancon ◴[] No.45675075{4}[source]
    Yes, you are correct - SourceFS also caches and replays build steps in a generic way. It works surprisingly well, to the point where it’s hard to believe until you actually see it in action (here is a short demo video, but it probably isn't the best way to showcase it: https://youtu.be/NwBGY9ZhuWc?t=76 ).

    We intentionally kept the blog post light on implementation details - partly to make it accessible to a broader audience, and partly because we will be posting gradually some more details. Sounds like build caching/replay is high on the desired blogpost list - ack :-).

    The build-system integration used here was a one-line change in the Android build tree. That said, you’re right - deeper integration with the build system could push the numbers even further, and that’s something we’re actively exploring.

    11. CJefferson ◴[] No.45677349[source]
    I used to use a python program called ‘fabricate’ which did this. If you track every file a compiler opens, then id the same compiler is run with the same flags, and no input changed, you can just drop a cached copy of the outputs in place.

    I’m actually disappointed this type of thing never caught on, it’s fairly easy on Linux to track every file a program accesses, so why do I need to write dependency lists?