The hard things involve things like unstable hash orderings, non-sorted filesystem listing, parallel execution, address-space randomization, ...
Since the build is reproducible, it should not matter when it was built. If you want to trace a build back to its source, there are much better ways than a timestamp.
Annoying edge cases come up for things like internal object serialization to sort things like JSON keys in config files.
There's lots of info on the Debian site about their reproducibility efforts, and there's a story from 2024's DebConf that may be of interest: https://lwn.net/Articles/985739/
I never actually checked that.
Yes. All archive entries and date source code macros and any other timestamps are set to a standardized date (in the past).
I work on a product whose user interface in one place says something like “Copyright 2004-2025”. The second year there is generated from __DATE__, that way nobody has to do anything to keep it up to date.
Pipe something like this into your build system:
date --date "$(git log HEAD --author-date-order --pretty=format:"%ad" --date=iso | head -n1)" +"%Y"
ASLR by itself shouldn't cause reproducibility issues, but it can certainly expose bugs.
The least difficult to solve for reproducible build but yes.
The real question is: why, in the past, was an entire ecosystem created where non-determinism was the norm and everybody thought it was somehow ok?
Instead of asking: "how one achieves reproducibility?" we may wonder "why did people got out of their way to make sure something as simple as a timestamp would screw determinism?".
For that's the anti-security mindset we have to fight. And Debian did.
Sticking it into --version output is helpful to know if, for example, the Python binary you're looking at is actually the one you just built rather than something shadowing that
Sometimes programs have hash tables which use object identity as key (i.e. pointer).
ASLR can cause corresponding objects in different runs of the program to have different pointers, and be ordered differently in an identity hash table.
A program producing some output which depends on this is not necessarily a bug, but becomes a reproducibility issue.
E.g. a compiler might output some object in which a symbol table is ordered by a pointer hash. The difference in order doesn't change the meaning/validity of the object file, but is is seen as the build not having reproduced exactly.
Software was more artisanal in nature…
I’m not, but I really think I should be. As in, there should be a thing that saves the state of the tree every time I type `make`, without any thought on my part.
This is (assuming Git—or Mercurial, or another feature-equivalent VCS) not hard in theory: just take your tree’s current state and put it somewhere, like in a merge commit to refs/compiles/master if you’re on refs/heads/master, or in the reflog for a special “stash”-like “compiles” ref, or whatever you like.
The reason I’m not doing it already is that, as far as I can tell, Git makes it stupendously hard to take a dirty working tree and index, do some Git to them (as opposed to a second worktree using the same gitdir), then put things back exactly as they were. I mean, that’s what `git stash` is supposed to do, right?.. Except if you don’t have anything staged then (sometimes?..) after `git stash pop` everything goes staged; and if you’ve added new files with `git add -N` then `git stash` will either refuse to work, or succeed but in such a way that a later `git stash pop` will not mark these files staged (or that might be the behaviour for plain `git add` on new files?). Gods help you if you have dirty submodules, or a merge conflict you’ve fixed but forgot to actually commit.
My point is, this sounds like a problem somebody’s bound to have solved by now. Does anyone have any pointers? As things are now, I take a look at it every so often, then remember or rediscover the abovementioned awfulness and give up. (Similarly for making precommit hooks run against the correct tree state when not all changes are being committed.)
Eg
git commit -am “Starting work on this important feature”
# make some changes
git add . && git commit —-squash “I made a change” HEAD
Then once you’re all done, you can do an auto squash interactive rebase and combine them all into your original change commit.You can also use `git reset —-soft $BRANCH_OR_COMITTISH` to go back to an earlier commit but leave all changes (except maybe new files? Sigh) staged.
You also might check out `git reflog` to find commits you might’ve orphaned.
Look I get people use the tools they use and perl is fine, i guess, it does its job, but if you use it you can safely expect to be mocked for prioritizing string operations or whatever perl offers over writing code anyone born after 1980 can read, let alone is willing to modify.
For such a social enterprise, open source orgs can be surprisingly daft when it comes to the social side of tool selection.
Would this tool be harder to write in python? Probably. Is it a smart idea to use it regardless? Absolutely. The aesthetics of perl are an absolute dumpster fire. Larry Wall deserves persecution for his crimes.
What I actually did at $LAST_JOB for dev tooling was to build in <commit sha> + <git diff | sha256> which is probably not amazingly reproducible, but at least you can ask "is the code I have right now what's running" which is all I needed.
Finally, there is probably enough flexibility in most build systems to pick between "reuse a cache artifact even if it has the wrong stamping metadata", "don't add any real information", and "spend an extra 45 cpu minutes on each build because I want $time baked into a module included by every other source file". I have successfully done all 3 with Bazel, for example.
Then at some point you happen to need all the entries, you iterate, and you get a random order. Which is not necessarily a problem unless you want reproducible builds, which is just a new requirement, not exposing a latent bug.
Perl always gets hate on HN, but I actually wonder of those commenter, who has actually spent over a single hours using Perl after they've read the Camel book.
Honest opinion: if you're going to be spending time in Linux in your career, then you should read the Camel book at least once. Then and only then should you get to have an opinion on Perl!
Imagine the filtering required for potential maintainers if they rewrote the packaging to JS.
> The sections below are currently a historical reference covering FreeBSD's migration from CVS to Subversion.
My apologies! At the end of the day, the point still stands in that SVN isn't a DVCS and so you wouldn't want to be committing unfinished code though, correct?
(I suspect I got FreeBSD mixed up with OpenBSD in my head here, embarrassing.)
Sorting had to be added to that kind of output.
And yes agree, people should read the camel book!
Yes of course, I would not write any type of servers in Perl, I would pick Go or Elixir or Erlang for such an use-case.