Most active commenters

    ←back to thread

    764 points bertman | 35 comments | | HN request time: 0.445s | source | bottom
    1. c0l0 ◴[] No.43484720[source]
    I never really understood the hype around reproducible builds. It seems to mostly be a vehicle to enable tivoization[0] while keeping users sufficiently calm. With reproducible buiilds, a vendor can prove to users that they did build $binary from $someopensourceproject, and then digitally sign the result so that it - and only it - would load and execute on the vendor-provided and/or vendor-controlled platform. But that still kills effective software freedom as long as I, the user, cannot do the same thing with my own build (whether it is unmodified or not) of $someopensourceproject.

    Therefore, I side with Tavis Ormandy on this debate: https://web.archive.org/web/20210616083816/https://blog.cmpx...

    [0]: https://en.wikipedia.org/wiki/Tivoization

    replies(12): >>43484745 #>>43484754 #>>43484942 #>>43485078 #>>43485108 #>>43485155 #>>43485403 #>>43485551 #>>43485635 #>>43486702 #>>43487034 #>>43492779 #
    2. klysm ◴[] No.43484745[source]
    One of the big advantages from my perspective is you can cache a lot more effectively throughout the build process when things are deterministic.
    replies(1): >>43484849 #
    3. oulipo ◴[] No.43484754[source]
    Reproducible builds are important also for: - caching artefacts - ensuring there's no malware somewhere that's been added in the build process
    replies(2): >>43484893 #>>43484902 #
    4. c0l0 ◴[] No.43484849[source]
    To achieve that it is enough to hash inputs, and cache resulting outputs. Repeating a build from scratch with an emtpy cache would not necessarily have to yield the same hashes all they way down to the last artifact, but that's actually a simplification of the whole process, and not a bad thing per se.
    replies(2): >>43485374 #>>43486011 #
    5. mjevans ◴[] No.43484893[source]
    Auditors can take a copy of the source, reproducibly build it themselves, and thus prove that the binaries someone would like to run match the provided source code.
    6. AceJohnny2 ◴[] No.43484902[source]
    > ensuring there's no malware somewhere that's been added in the build process

    i.e. supply-chain safety

    It doesn't entirely resolve Thompson's "Trusting Trust" problem, but it goes a long way.

    replies(1): >>43485077 #
    7. __MatrixMan__ ◴[] No.43484942[source]
    You can still slip malware into a reproducible build, but you have to do it in the open. If you do it via injecting a tampered-with artifact via some side channel which is specific to your target, they will end up with a hash that doesn't agree with the one that is trusted by rest of the community, and will have reason for suspicion.

    That benefit goes away if the rest of the community all have hashes that don't agree with each other. Then the tampered-with one doesn't stand out.

    8. 0cf8612b2e1e ◴[] No.43485077{3}[source]
    Is it possible for mortals to rebuild gcc from scratch? Can I start with some minimal, auditable compiler (tcc?) and build up to a modern gcc? Or would it be some byzantine path where I need to compile gcc v1998, then perl, then Python 1.8, enabling you to compile gcc v2005, which lets you build Python2.3, etc.
    replies(5): >>43485390 #>>43485858 #>>43487062 #>>43487064 #>>43487173 #
    9. inglor_cz ◴[] No.43485078[source]
    It is not that different from tamper-proofing medications. It proves that no one added poison to whatever you are consuming, after that thing left its "factory".
    10. myrmidon ◴[] No.43485108[source]
    Lets turn this around. Why would you ever want non-reproducible builds?

    Every bit of nondeterminism in your binaries, even if it's just memory layout alone, might alter the behavior, i.e. break things on some builds, which is just really not desirable.

    Why would you ever want builds from the same source to have potentially different performance, different output size or otherwise different behavior?

    IMO tivoization is completely unrelated, because the vendor most certainly does not need reproducible builds in order to lock down a platform.

    replies(1): >>43486473 #
    11. ahlCVA ◴[] No.43485155[source]
    For me as a developer, reproducible builds are a boon during debugging because I can be sure that I have reproduced the build environment corresponding to an artifact (which is not trivial, particularly for more complex things like whole OS image builds which are common in the embedded world, for example) in the real world precisely when I need to troubleshoot something.

    Then I can be sure that I only make the changes I intend to do when building upon this state (instead of, for example, "fixing" something by accident because the link order of something changed which changed the memory layout which hides a bug).

    replies(1): >>43490024 #
    12. mschuster91 ◴[] No.43485374{3}[source]
    > To achieve that it is enough to hash inputs, and cache resulting outputs.

    Thing is, inputs can be nondeterministic too - some programs (used to) embed the current git commit hash into the final binary so that a `./foo --version` gives a quick and easy way for bug triage to check if the user isn't using a version from years ago.

    replies(2): >>43485507 #>>43486399 #
    13. uecker ◴[] No.43485390{4}[source]
    It is a byzantine path, also because gcc switched to C++ at some point (for no good reason IMHO). But there is a project that maintains such a bootstrap path: https://www.gnu.org/software/mes/
    14. layer8 ◴[] No.43485403[source]
    There is merit to some of the security arguments. However, one thing reproducible builds enable is to reliably identify the source code version from which a particular build was produced. If a build artifact is found to have undesirable behavior (whether malicious or just a genuine bug or misdesign), reproducible builds allow to reliably trace that behavior back to the source code, and then to only modify the undesired behavior. If, on the other hand, you can’t identify the corresponding source code version with certainty, and therefore have to fix the behavior based on a possibly different version of the source code (or of the build environment), then you don’t know that it doesn’t additionally contain any new undesired behaviors.
    15. layer8 ◴[] No.43485507{4}[source]
    This is only a problem if those nondeterministic inputs are actually included in the hash. This is often not the case, because the values are included implicitly in the build rather than explicitly.

    (Just playing devil’s advocate here.)

    16. IshKebab ◴[] No.43485551[source]
    Tivoisation doesn't depend on reproducible builds at all. Vendors don't need to mathematically prove the exact origin of their binaries.
    17. rcxdude ◴[] No.43485635[source]
    It basically means that not everybody needs to build from source code if they want to verify that the binaries they're using haven't had malware injected during the build process. I.e. so long as enough people check that they can reproduce the build, and call out any case where it doesn't, everyone else can just use the binaries without building from source. This means auditing efforts can focus just on the source code, which is a lot more tractable (but still hard, and imperfect. But it means a potential attacker needs to work a lot harder, as opppsed to a compromise of the build servers basically giving them free reign without much risk of detection).

    It doesn't really do anything at all for tivoisation, Tivo managed it just fine without reproducable builds.

    18. tetha ◴[] No.43485858{4}[source]
    Mh. Though, if you have deterministic builds for GCC, imagine how much of a problem some nerd in Northern Washington or Scandinavia with their own strange C build chain would be to inject something strange into these compilers into the build process.

    Like, you spend millions to get that one backdoor into the compiler. And then this guy is like "Uhm. Guys. I have this largely perl-based build process reproducing a modern GCC on a Pentium with 166 Mhz swapping RAM to disk because the motherboard can't hold that much memory. But the desk fan helps cooling. It takes about 2 months or 3 to build, but that's fine. I start it and then I work in the woods. It was identical to your releases about 5 times in the last 2 years (can't build more often), and now it isn't somewhere deep in the code sections. My arduino based floppy emulator is currently moving the binaries through the network"

    Sure, it's a cyberpunk hero-fantasy, but deterministic builds would make these kind of shenanigans possible.

    And at the end of the day, independent validation is one of the strongest ways to fight corruption.

    19. klysm ◴[] No.43486011{3}[source]
    Outputs are used as inputs later. If everything is deterministic, you can actually cache everything by hash
    20. telotortium ◴[] No.43486399{4}[source]
    Adding the Git hash is reproducible, assuming you build from a clean tree (which the build script can check). Embedding the current date and time is the canonical cause of non-reproducibility, but that can be worked around in most cases by embedding the commit and/or author date of the commit instead.
    21. RJIb8RBYxzAMX9u ◴[] No.43486473[source]
    > Lets turn this around. Why would you ever want non-reproducible builds?

    It's not about wanting non-reproducible builds, but what am I sacrificing to achieve reproducible builds. Debian's reproducible build efforts have been going for ten years, and it's still not yet complete. Arguably Debian could have diverted ten years of engineering resources elsewhere. There's no end to the list of worthwhile projects to tackle, and clearly Debian believes that reproducible builds is high priority, but reasonable people can disagree on that.

    This not to say reproducible builds are not worth doing, just that depending on your project / org lifecycle and available resources (plus a lot of subjective judgement), you may want to do something else first.

    replies(1): >>43486796 #
    22. pavon ◴[] No.43486702[source]
    Tavis makes some good arguments, but since that post I've seen a couple real-world situations where reproducible builds are valuable.

    One is where the upstream software developer wants to build and sign their software so that users know it came from them, but distributors also want to be the ones to build and sign the software so they know what exactly it is they are distributing. The most public example is FDroid[1]. Reproducible builds allow both the software developer and the distributor to sign-off on a single binary, giving users addition assurance that neither are sneaking something in. This is similar to the last example that Tavis gave, but shows that it is a workable process that provides real security benefit to the user, not just a hypothetical stretch.

    The second is license enforcement. Companies that distribute (A/L)GPL software are required to distribute the exact source code that the binary was created from, and ability to compile and replace the software with a modified version (for GPLv3). However, a lot of companies are lazy about this and publish source code that doesn't include all their changes. A reproducible build demonstrates that the source they provided is what was used to create the binary. Of course, the lazy ones aren't going to go out of their way to create reproducible builds, but the more reproducible the upstream code build system is the fewer extraneous differences downstream builds should have. And it allows greater confidence in the good guys who are following the license.

    And like others have said, I don't see the Tivoization argument at all. TiVo didn't have reproducible builds, and they Tivo'd their software just fine. At worst a reproducible build might pacify some security minded folks that would otherwise object to Tivoization, but there will still be people who object to it out of the desire to modify the system.

    [1] https://f-droid.org/docs/Reproducible_Builds/

    23. progval ◴[] No.43486796{3}[source]
    Debian didn't "divert engineering resources" to this project. People, some of whom happen to be Debian developers, decided to work on it for their own reasons. If the Reproducible Builds effort didn't exist, it doesn't mean they would have spent more time working on other areas of Debian. Maybe even less, because the RB effort was an opportunity to find and fix other bugs.
    replies(1): >>43487311 #
    24. bobmcnamara ◴[] No.43487034[source]
    > This diagram demonstrates how to get a trusted binary without reproducible builds.

    Ages ago our device firmware release processes caught the early stage of a malware infection because the hash of one of our intermediate code generators (win32 exe) changed between two adjacent releases without any commits that should've impacted that tool.

    Turns out they had hooked something into windows to monitor for exe accesses and were accidentally patching out codegen.

    Eventually you just top trusting anything and live in the woods I guess.

    25. XorNot ◴[] No.43487062{4}[source]
    It is sort of like that. It's been documented: https://github.com/fosslinux/live-bootstrap/

    (This is an alternative to the Guix/Scheme thing).

    26. ◴[] No.43487064{4}[source]
    27. fsflover ◴[] No.43487173{4}[source]
    https://news.ycombinator.com/item?id=41368835
    28. RJIb8RBYxzAMX9u ◴[] No.43487311{4}[source]
    Yes, the system is not closed and certainly people may simply not contribute to Debian at all. However, my main point is that reasonable people disagree on the relative importance of RR among other things, so it's not about "want[ing] non-reproducible builds" even if one has unlimited resources, but rather wanting RR, but not at the expense of X, where X differs from person to person.
    replies(1): >>43488588 #
    29. robertlagrant ◴[] No.43488588{5}[source]
    "It's possible to disagree on whether a feature is worth doing" is technically true, but why is it worth discussing time spent by volunteers on something already done? People do all sorts of things in their free time; what's the opportunity cost there?
    30. signa11 ◴[] No.43490024[source]
    so what you are looking for is reproducible build environment ? things like docker have been around doing just that for a while now.
    replies(3): >>43491514 #>>43491651 #>>43499869 #
    31. turboponyy ◴[] No.43491514{3}[source]
    Docker can be used to create reproducible environments (container images), but can not be used to reproduce environments from source (running a Dockerfile will always produce a different output) - that is, the build definition and build artifact are not equivalent, which is not the case for tools like Nix.
    32. myrmidon ◴[] No.43491651{3}[source]
    > things like docker have been around doing just that for a while now.

    Thats just not enough. If you are hunting down tricky bugs, then even extremely minor things like memory layout of your application might alter the behavior completely-- some uninitialized read might give you "0" every time in one build, while crashing everything with unexected non-zero values in another; performance characteristics might change wildly and even trigger (or avoid) race conditions in builds from the exact same source thanks to cache interactions, etc.

    There is a lot of developer preference in how an "ideal" processs/toolchain/build environment looks like, but reproducible builds (unlike a lot of things that come down to preference) are an objective, qualitative improvement-- in the exact same way that it is an improvement if every release of your software corresponds to one exact set of sourcecode.

    replies(1): >>43494239 #
    33. chupasaurus ◴[] No.43492779[source]
    > With reproducible buiilds, a vendor can prove to users that they did build $binary from $someopensourceproject, and then digitally sign the result so that it - and only it - would load and execute on the vendor-provided and/or vendor-controlled platform.

    As long as Debian provides source packages and access to their repos - digital signature has nothing to do with Reproducible Builds, you actually don't need one for the same bytes.

    34. nottorp ◴[] No.43494239{4}[source]
    And he said embedded.

    That means it crashes on some device that is on a pole in the middle of nowhere, or in a factory where you have to wear armor to go debug it on site.

    Docker is cushy ... for servers and developer machines.

    35. ahlCVA ◴[] No.43499869{3}[source]
    I see reproducible builds more as a contract between the originator of an artifact and yourself today (the two might be the same person at different points in time!) saying "if you follow this process, you'll get a bit-identical artifact to what I have gotten when I followed this process originally".

    If that process involves Docker or Nix or whatever - that's fine. The point is that there is some robust way of transforming the source code to the artifact reproducibly. (The less moving parts are involved in this process though the better, just as a matter of practicality. Locking up the original build machine in a bank vault and having to use it to reproduce the binary is a bit inconvenient.)

    The point here is that there is a way for me to get to a "known good" starting point and that I can be 100% confident that it is good. Having a bit-reproducible process is the no-further-doubts-possible way of achieving that.

    Sure it is possible that I still get an artifact that is equivalent in all the ways that I care about if I run the build in the exact same Docker container even if the binaries don't match (because for example some build step embeds a timestamp somewhere). But at that point I'll have to start investigating if the cause of the difference is innocuous or if there are problems.

    Equivalence can only happen in one way, but there's an infinite number of ways to get inequivalence.