←back to thread

764 points bertman | 2 comments | | HN request time: 1.413s | source
Show context
imcritic ◴[] No.43484638[source]
I don't get how someone achieves reproducibility of builds: what about files metadata like creation/modification timestamps? Do they forge them? Or are these data treated as not important enough (like it 2 files with different metadata but identical contents should have the same checksum when hashed)?
replies(10): >>43484658 #>>43484661 #>>43484682 #>>43484689 #>>43484705 #>>43484760 #>>43485346 #>>43485379 #>>43486079 #>>43488794 #
o11c ◴[] No.43484682[source]
Timestamps are easiest part - you just set everything according to the chosen epoch.

The hard things involve things like unstable hash orderings, non-sorted filesystem listing, parallel execution, address-space randomization, ...

replies(1): >>43485157 #
koolba ◴[] No.43485157[source]
ASLR shouldn’t be an issue unless you intend to capture the entire memory state of the application. It’s an intermediate representation in memory, not an output of any given step of a build.

Annoying edge cases come up for things like internal object serialization to sort things like JSON keys in config files.

replies(3): >>43485872 #>>43488447 #>>43489756 #
1. cperciva ◴[] No.43488447[source]
FreeBSD tripped over an issue recently where a C++ program (I think clang?) used a collection of pointers and output values in an order based on the pointers rather than the values they pointed to.

ASLR by itself shouldn't cause reproducibility issues, but it can certainly expose bugs.

replies(1): >>43492945 #
2. ahartmetz ◴[] No.43492945[source]
It is sometimes just fine to have a hash table with pointers as keys. It is by design an unordered collection, so you do not care about the order, only about finding entries.

Then at some point you happen to need all the entries, you iterate, and you get a random order. Which is not necessarily a problem unless you want reproducible builds, which is just a new requirement, not exposing a latent bug.