Therefore, I side with Tavis Ormandy on this debate: https://web.archive.org/web/20210616083816/https://blog.cmpx...
Therefore, I side with Tavis Ormandy on this debate: https://web.archive.org/web/20210616083816/https://blog.cmpx...
i.e. supply-chain safety
It doesn't entirely resolve Thompson's "Trusting Trust" problem, but it goes a long way.
That benefit goes away if the rest of the community all have hashes that don't agree with each other. Then the tampered-with one doesn't stand out.
Every bit of nondeterminism in your binaries, even if it's just memory layout alone, might alter the behavior, i.e. break things on some builds, which is just really not desirable.
Why would you ever want builds from the same source to have potentially different performance, different output size or otherwise different behavior?
IMO tivoization is completely unrelated, because the vendor most certainly does not need reproducible builds in order to lock down a platform.
Then I can be sure that I only make the changes I intend to do when building upon this state (instead of, for example, "fixing" something by accident because the link order of something changed which changed the memory layout which hides a bug).
Thing is, inputs can be nondeterministic too - some programs (used to) embed the current git commit hash into the final binary so that a `./foo --version` gives a quick and easy way for bug triage to check if the user isn't using a version from years ago.
(Just playing devil’s advocate here.)
It doesn't really do anything at all for tivoisation, Tivo managed it just fine without reproducable builds.
Like, you spend millions to get that one backdoor into the compiler. And then this guy is like "Uhm. Guys. I have this largely perl-based build process reproducing a modern GCC on a Pentium with 166 Mhz swapping RAM to disk because the motherboard can't hold that much memory. But the desk fan helps cooling. It takes about 2 months or 3 to build, but that's fine. I start it and then I work in the woods. It was identical to your releases about 5 times in the last 2 years (can't build more often), and now it isn't somewhere deep in the code sections. My arduino based floppy emulator is currently moving the binaries through the network"
Sure, it's a cyberpunk hero-fantasy, but deterministic builds would make these kind of shenanigans possible.
And at the end of the day, independent validation is one of the strongest ways to fight corruption.
It's not about wanting non-reproducible builds, but what am I sacrificing to achieve reproducible builds. Debian's reproducible build efforts have been going for ten years, and it's still not yet complete. Arguably Debian could have diverted ten years of engineering resources elsewhere. There's no end to the list of worthwhile projects to tackle, and clearly Debian believes that reproducible builds is high priority, but reasonable people can disagree on that.
This not to say reproducible builds are not worth doing, just that depending on your project / org lifecycle and available resources (plus a lot of subjective judgement), you may want to do something else first.
One is where the upstream software developer wants to build and sign their software so that users know it came from them, but distributors also want to be the ones to build and sign the software so they know what exactly it is they are distributing. The most public example is FDroid[1]. Reproducible builds allow both the software developer and the distributor to sign-off on a single binary, giving users addition assurance that neither are sneaking something in. This is similar to the last example that Tavis gave, but shows that it is a workable process that provides real security benefit to the user, not just a hypothetical stretch.
The second is license enforcement. Companies that distribute (A/L)GPL software are required to distribute the exact source code that the binary was created from, and ability to compile and replace the software with a modified version (for GPLv3). However, a lot of companies are lazy about this and publish source code that doesn't include all their changes. A reproducible build demonstrates that the source they provided is what was used to create the binary. Of course, the lazy ones aren't going to go out of their way to create reproducible builds, but the more reproducible the upstream code build system is the fewer extraneous differences downstream builds should have. And it allows greater confidence in the good guys who are following the license.
And like others have said, I don't see the Tivoization argument at all. TiVo didn't have reproducible builds, and they Tivo'd their software just fine. At worst a reproducible build might pacify some security minded folks that would otherwise object to Tivoization, but there will still be people who object to it out of the desire to modify the system.
Ages ago our device firmware release processes caught the early stage of a malware infection because the hash of one of our intermediate code generators (win32 exe) changed between two adjacent releases without any commits that should've impacted that tool.
Turns out they had hooked something into windows to monitor for exe accesses and were accidentally patching out codegen.
Eventually you just top trusting anything and live in the woods I guess.
(This is an alternative to the Guix/Scheme thing).
Thats just not enough. If you are hunting down tricky bugs, then even extremely minor things like memory layout of your application might alter the behavior completely-- some uninitialized read might give you "0" every time in one build, while crashing everything with unexected non-zero values in another; performance characteristics might change wildly and even trigger (or avoid) race conditions in builds from the exact same source thanks to cache interactions, etc.
There is a lot of developer preference in how an "ideal" processs/toolchain/build environment looks like, but reproducible builds (unlike a lot of things that come down to preference) are an objective, qualitative improvement-- in the exact same way that it is an improvement if every release of your software corresponds to one exact set of sourcecode.
As long as Debian provides source packages and access to their repos - digital signature has nothing to do with Reproducible Builds, you actually don't need one for the same bytes.
If that process involves Docker or Nix or whatever - that's fine. The point is that there is some robust way of transforming the source code to the artifact reproducibly. (The less moving parts are involved in this process though the better, just as a matter of practicality. Locking up the original build machine in a bank vault and having to use it to reproduce the binary is a bit inconvenient.)
The point here is that there is a way for me to get to a "known good" starting point and that I can be 100% confident that it is good. Having a bit-reproducible process is the no-further-doubts-possible way of achieving that.
Sure it is possible that I still get an artifact that is equivalent in all the ways that I care about if I run the build in the exact same Docker container even if the binaries don't match (because for example some build step embeds a timestamp somewhere). But at that point I'll have to start investigating if the cause of the difference is innocuous or if there are problems.
Equivalence can only happen in one way, but there's an infinite number of ways to get inequivalence.