Most active commenters
  • cperciva(4)
  • steveklabnik(3)

←back to thread

764 points bertman | 29 comments | | HN request time: 0s | source | bottom
Show context
imcritic ◴[] No.43484638[source]
I don't get how someone achieves reproducibility of builds: what about files metadata like creation/modification timestamps? Do they forge them? Or are these data treated as not important enough (like it 2 files with different metadata but identical contents should have the same checksum when hashed)?
replies(10): >>43484658 #>>43484661 #>>43484682 #>>43484689 #>>43484705 #>>43484760 #>>43485346 #>>43485379 #>>43486079 #>>43488794 #
purkka ◴[] No.43484689[source]
Generally, yes: https://reproducible-builds.org/docs/timestamps/

Since the build is reproducible, it should not matter when it was built. If you want to trace a build back to its source, there are much better ways than a timestamp.

replies(1): >>43485622 #
1. ryandrake ◴[] No.43485622[source]
C compilers offer __DATE__ and __TIME__ macros, which expand to string constants that describe the date and time that the preprocessor was invoked. Any code using these would have different strings each time it was built, and would need to be modified. I can't think of a good reason for them to be used in an actual production program, but for whatever reason, they exist.
replies(4): >>43485670 #>>43486042 #>>43487552 #>>43488994 #
2. fmbb ◴[] No.43485670[source]
Toolchains for reproducible software likely let you set these values, or ensure they are 1970-01-01 00:00:00
replies(2): >>43486128 #>>43492261 #
3. mananaysiempre ◴[] No.43486042[source]
And that’s why GCC (among others) accepts SOURCE_DATE_EPOCH from the environment, and also has -Wdate-time. As for using __DATE__ or __TIME__ in code, I suspect that was more helpful in the age before ubiquitous source control and build IDs.
replies(1): >>43488470 #
4. mikepurvis ◴[] No.43486128[source]
Nix sets everything to the epoch, although I believe Debian's approach is to just use the date of the newest file in the dsc tarballs.
replies(2): >>43488193 #>>43492243 #
5. repiret ◴[] No.43487552[source]
> I can't think of a good reason for them

I work on a product whose user interface in one place says something like “Copyright 2004-2025”. The second year there is generated from __DATE__, that way nobody has to do anything to keep it up to date.

replies(1): >>43488135 #
6. Arelius ◴[] No.43488135[source]
I mean, you could do that, it's sort-of a lie though, maybe something better would be using the date of the most recent commit, which would be both more accurate, as far as authorship goes, and actually deterministic..

Pipe something like this into your build system:

    date --date "$(git log HEAD --author-date-order --pretty=format:"%ad" --date=iso | head -n1)" +"%Y"
7. yjftsjthsd-h ◴[] No.43488193{3}[source]
Nix can also set it to things other than 0; I think my favorite is to set it by the time of the commit from which you're building.
replies(2): >>43491317 #>>43492466 #
8. cperciva ◴[] No.43488470[source]
Source control only helps you if everything is committed. If you're, say, working on changes to the FreeBSD boot loader, you're probably not committing those changes every time you test something but it's very useful to know "this is the version I built ten minutes ago" vs "I just booted yesterday's version because I forgot to install the new code after I built it".
replies(5): >>43489912 #>>43490489 #>>43492048 #>>43492803 #>>43492948 #
9. rtpg ◴[] No.43488994[source]
It's super nice to have timestamps as a quick way to know what program you're looking at.

Sticking it into --version output is helpful to know if, for example, the Python binary you're looking at is actually the one you just built rather than something shadowing that

replies(1): >>43498435 #
10. lmm ◴[] No.43489912{3}[source]
> If you're, say, working on changes to the FreeBSD boot loader, you're probably not committing those changes every time you test something

Whyever not? Does the FreeBSD boot loader not have a VCS or something?

replies(2): >>43489986 #>>43490170 #
11. cperciva ◴[] No.43489986{4}[source]
It's in the FreeBSD src tree. But we usually commit code once it's working...
replies(1): >>43512975 #
12. steveklabnik ◴[] No.43490170{4}[source]
A subtlety that may be lost: FreeBSD uses CVS, and so there isn't a way to commit locally while you're working, like with a DVCS.
replies(1): >>43495755 #
13. mananaysiempre ◴[] No.43490489{3}[source]
> you're probably not committing those changes every time you test something

I’m not, but I really think I should be. As in, there should be a thing that saves the state of the tree every time I type `make`, without any thought on my part.

This is (assuming Git—or Mercurial, or another feature-equivalent VCS) not hard in theory: just take your tree’s current state and put it somewhere, like in a merge commit to refs/compiles/master if you’re on refs/heads/master, or in the reflog for a special “stash”-like “compiles” ref, or whatever you like.

The reason I’m not doing it already is that, as far as I can tell, Git makes it stupendously hard to take a dirty working tree and index, do some Git to them (as opposed to a second worktree using the same gitdir), then put things back exactly as they were. I mean, that’s what `git stash` is supposed to do, right?.. Except if you don’t have anything staged then (sometimes?..) after `git stash pop` everything goes staged; and if you’ve added new files with `git add -N` then `git stash` will either refuse to work, or succeed but in such a way that a later `git stash pop` will not mark these files staged (or that might be the behaviour for plain `git add` on new files?). Gods help you if you have dirty submodules, or a merge conflict you’ve fixed but forgot to actually commit.

My point is, this sounds like a problem somebody’s bound to have solved by now. Does anyone have any pointers? As things are now, I take a look at it every so often, then remember or rediscover the abovementioned awfulness and give up. (Similarly for making precommit hooks run against the correct tree state when not all changes are being committed.)

replies(1): >>43490958 #
14. beecasthurlbow ◴[] No.43490958{4}[source]
An easy (ish) option here is to use autosquashing [1], which lets you create individual commits (saving your work - yay!) and then eventually clean em up into a single commit!

Eg

    git commit -am “Starting work on this important feature”
    
    # make some changes
    git add . && git commit —-squash “I made a change” HEAD

Then once you’re all done, you can do an auto squash interactive rebase and combine them all into your original change commit.

You can also use `git reset —-soft $BRANCH_OR_COMITTISH` to go back to an earlier commit but leave all changes (except maybe new files? Sigh) staged.

You also might check out `git reflog` to find commits you might’ve orphaned.

[1] https://thoughtbot.com/blog/autosquashing-git-commits

15. ◴[] No.43491317{4}[source]
16. jrockway ◴[] No.43492048{3}[source]
Versions built into the code are nice. I think the correct answer is to commit before the build proper starts (automatically, without changing your HEAD ref) and put that in there. Then you can check version control for the date information, but if someone else happens to add the same bytes to the same base commit, they also have the same version that you do. (Similarly, you can always make the date "XXXXXXXXXXXXXXXXXXXXXX" or something, and just replace the bytes with the actual date after the build as you deploy it.)

What I actually did at $LAST_JOB for dev tooling was to build in <commit sha> + <git diff | sha256> which is probably not amazingly reproducible, but at least you can ask "is the code I have right now what's running" which is all I needed.

Finally, there is probably enough flexibility in most build systems to pick between "reuse a cache artifact even if it has the wrong stamping metadata", "don't add any real information", and "spend an extra 45 cpu minutes on each build because I want $time baked into a module included by every other source file". I have successfully done all 3 with Bazel, for example.

replies(1): >>43500244 #
17. lamby ◴[] No.43492243{3}[source]
Debian's approach is actually to use the date specified in the top entry in the debian/changelog file. That's more transparent and resilient than any mtime.
18. lamby ◴[] No.43492261[source]
Strangely enough, sometimes using the epoch can expose bugs in libraries (etc.) when running or building in a timezone west of Greenwich due to the negative time offset taking time "below" zero.
19. terinjokes ◴[] No.43492466{4}[source]
Which is also used when the contents of a derivation will be included in a zip file. The Unix epoch is about a decade older than the zip epoch.
20. account42 ◴[] No.43492803{3}[source]
Nobody cares about reproducibility of local development builds so just limit your use of date/time to those and use a more appropriate build reference for release builds.
21. chippiewill ◴[] No.43492948{3}[source]
Which is fine, you don't need to use a reproducible build for local dev and can just use the real timestamp.
22. cperciva ◴[] No.43495755{5}[source]
FreeBSD hasn't used CVS since 2008.
replies(1): >>43495830 #
23. steveklabnik ◴[] No.43495830{6}[source]
Huh! So, before I posted this, I went to go double check, and found https://wiki.freebsd.org/VersionControl. What I missed was the (now obvious) banner saying

> The sections below are currently a historical reference covering FreeBSD's migration from CVS to Subversion.

My apologies! At the end of the day, the point still stands in that SVN isn't a DVCS and so you wouldn't want to be committing unfinished code though, correct?

(I suspect I got FreeBSD mixed up with OpenBSD in my head here, embarrassing.)

replies(2): >>43498328 #>>43500995 #
24. jraph ◴[] No.43498328{7}[source]
You could still use git-svn, but yeah, as another commenter wrote, I don't think reproducible build is that useful when debugging, it should be fine to have an actual timestamp in the binaries.
25. izacus ◴[] No.43498435[source]
The whole point or reproducible builds is that you don't need to rely on timestamps and similar information to know which binary you're looking at.
26. ◴[] No.43500244{4}[source]
27. cperciva ◴[] No.43500995{7}[source]
Well yes, but we've actually migrated to Git now. ;-)
replies(1): >>43505962 #
28. steveklabnik ◴[] No.43505962{8}[source]
Welp! Egg on my face twice!
29. lmm ◴[] No.43512975{5}[source]
Huh. If I was confident enough in a change to consider it worth doing an actual boot to test I'd certainly want to have it committed, to be able to track and go back to it. Even the broken parts of history are valuable IME.