←back to thread

764 points bertman | 5 comments | | HN request time: 0.453s | source
Show context
c0l0 ◴[] No.43484720[source]
I never really understood the hype around reproducible builds. It seems to mostly be a vehicle to enable tivoization[0] while keeping users sufficiently calm. With reproducible buiilds, a vendor can prove to users that they did build $binary from $someopensourceproject, and then digitally sign the result so that it - and only it - would load and execute on the vendor-provided and/or vendor-controlled platform. But that still kills effective software freedom as long as I, the user, cannot do the same thing with my own build (whether it is unmodified or not) of $someopensourceproject.

Therefore, I side with Tavis Ormandy on this debate: https://web.archive.org/web/20210616083816/https://blog.cmpx...

[0]: https://en.wikipedia.org/wiki/Tivoization

replies(12): >>43484745 #>>43484754 #>>43484942 #>>43485078 #>>43485108 #>>43485155 #>>43485403 #>>43485551 #>>43485635 #>>43486702 #>>43487034 #>>43492779 #
klysm ◴[] No.43484745[source]
One of the big advantages from my perspective is you can cache a lot more effectively throughout the build process when things are deterministic.
replies(1): >>43484849 #
1. c0l0 ◴[] No.43484849[source]
To achieve that it is enough to hash inputs, and cache resulting outputs. Repeating a build from scratch with an emtpy cache would not necessarily have to yield the same hashes all they way down to the last artifact, but that's actually a simplification of the whole process, and not a bad thing per se.
replies(2): >>43485374 #>>43486011 #
2. mschuster91 ◴[] No.43485374[source]
> To achieve that it is enough to hash inputs, and cache resulting outputs.

Thing is, inputs can be nondeterministic too - some programs (used to) embed the current git commit hash into the final binary so that a `./foo --version` gives a quick and easy way for bug triage to check if the user isn't using a version from years ago.

replies(2): >>43485507 #>>43486399 #
3. layer8 ◴[] No.43485507[source]
This is only a problem if those nondeterministic inputs are actually included in the hash. This is often not the case, because the values are included implicitly in the build rather than explicitly.

(Just playing devil’s advocate here.)

4. klysm ◴[] No.43486011[source]
Outputs are used as inputs later. If everything is deterministic, you can actually cache everything by hash
5. telotortium ◴[] No.43486399[source]
Adding the Git hash is reproducible, assuming you build from a clean tree (which the build script can check). Embedding the current date and time is the canonical cause of non-reproducibility, but that can be worked around in most cases by embedding the commit and/or author date of the commit instead.