←back to thread

766 points bertman | 10 comments | | HN request time: 1.367s | source | bottom
Show context
imcritic ◴[] No.43484638[source]

I don't get how someone achieves reproducibility of builds: what about files metadata like creation/modification timestamps? Do they forge them? Or are these data treated as not important enough (like it 2 files with different metadata but identical contents should have the same checksum when hashed)?

replies(10): >>43484658 #>>43484661 #>>43484682 #>>43484689 #>>43484705 #>>43484760 #>>43485346 #>>43485379 #>>43486079 #>>43488794 #
1. o11c ◴[] No.43484682[source]

Timestamps are easiest part - you just set everything according to the chosen epoch.

The hard things involve things like unstable hash orderings, non-sorted filesystem listing, parallel execution, address-space randomization, ...

replies(1): >>43485157 #
2. koolba ◴[] No.43485157[source]

ASLR shouldn’t be an issue unless you intend to capture the entire memory state of the application. It’s an intermediate representation in memory, not an output of any given step of a build.

Annoying edge cases come up for things like internal object serialization to sort things like JSON keys in config files.

replies(3): >>43485872 #>>43488447 #>>43489756 #
3. sodality2 ◴[] No.43485872[source]

Let’s say a compiler is doing something in a multi-threaded manner - isn’t it possible that ASLR would affect the ordering of certain events which could change the compiled output? Sure you could just set threads to 1 but there’s probably some more edge cases in there I haven’t thought of.

replies(1): >>43486161 #
4. zamadatix ◴[] No.43486161{3}[source]

I think you'd need the compiler to guarantee serialization order of such operations regardless if you used ASLR or not. Otherwise you're just hoping thread scheduling, core clocking, thread memory access, and many other things are the same between every system trying to do a reproducible build. Even setting threads to 1 may not solve that problem class if asynchronous functions/syscalls come into play.

5. cperciva ◴[] No.43488447[source]

FreeBSD tripped over an issue recently where a C++ program (I think clang?) used a collection of pointers and output values in an order based on the pointers rather than the values they pointed to.

ASLR by itself shouldn't cause reproducibility issues, but it can certainly expose bugs.

replies(1): >>43492945 #
6. kazinator ◴[] No.43489756[source]

ASLR means that the pointers from malloc (which may come from mmap) are not predictable.

Sometimes programs have hash tables which use object identity as key (i.e. pointer).

ASLR can cause corresponding objects in different runs of the program to have different pointers, and be ordered differently in an identity hash table.

A program producing some output which depends on this is not necessarily a bug, but becomes a reproducibility issue.

E.g. a compiler might output some object in which a symbol table is ordered by a pointer hash. The difference in order doesn't change the meaning/validity of the object file, but is is seen as the build not having reproduced exactly.

replies(1): >>43492749 #
7. account42 ◴[] No.43492749{3}[source]

That's just one example of nondeterminism in compilers though - at the end it's the responsibility of the compile to provide options not to do that.

replies(1): >>43495112 #
8. ahartmetz ◴[] No.43492945{3}[source]

It is sometimes just fine to have a hash table with pointers as keys. It is by design an unordered collection, so you do not care about the order, only about finding entries.

Then at some point you happen to need all the entries, you iterate, and you get a random order. Which is not necessarily a problem unless you want reproducible builds, which is just a new requirement, not exposing a latent bug.

9. kazinator ◴[] No.43495112{4}[source]

Not for external causes like ASLR and memory allocators; those things should have their respective options for that.

replies(1): >>43503110 #
10. account42 ◴[] No.43503110{5}[source]

There is no guarantee that memory allocation is deterministic even without ASLR. If your program is supposed to be deterministic but its output depends on the memory addresses returned by the allocator then your program is buggy.