Fifty Years of Open Source Software Supply Chain Security

1. lrvick ◴[07 Apr 25 19:35 UTC] No.43615037[source]▶

Great coverage, however it failed to mention code review and artifact signing as well as full source bootstrapping which are fundamental defenses most distros skip.

In our distro, Stagex, our threat model assumes at least one maintainer, sysadmin, or computer is compromised at all times.

This has resulted in some specific design choices and practices:

- 100% deterministic, hermetic, reproducible

- full source bootstrapped from 180 bytes of human-auditable machine code

- all commits signed by authors

- all reviews signed by reviewers

- all released artifacts are multi-party reproduced and signed

- fully OCI (container) native all the way down "FROM scratch"

- All packages easily hash-locked to give downstream software easy determinism as well

This all goes well beyond the tactics used in Nix and Guix.

As far as we know, Stagex is the only distro designed to strictly distrust maintainers.

https://stagex.tools

replies(4): >>43616418 #>>43617025 #>>43617119 #>>43621868 #

2. AstralStorm ◴[07 Apr 25 22:00 UTC] No.43616418[source]▶

>>43615037 (TP) #

Good step.

It doesn't distrust the developers of the software though, so does not fix the biggest hole. Multiparty reproduction does not fix it either, that only distrusts the build system.

The bigger the project, the higher the chance something slips through, if even an exploitable bug. Maybe it's the developer themselves being compromised, or their maintainer.

Reviews are done on what, you have someone reviewing clang code? Binutils?

replies(3): >>43616685 #>>43616980 #>>43618024 #

3. TacticalCoder ◴[07 Apr 25 22:36 UTC] No.43616685[source]▶

>>43616418 #

> Reviews are done on what, you have someone reviewing clang code? Binutils?

There aren't random developers pushing commits to these codebases: these are used by virtually every Linux distro out there (OK, maybe not the Kubernetes one that ships only 12 binaries, forgot its name).

It seems obvious to me that GP is talking about protection against rogue distro maintainers, not fundamental packages being backdoored.

You're basically saying: "GP's work is pointless because Linus could insert a backdoor in the Linux kernel".

In addition to that determinism and 100% reproducibility brings another gigantic benefit: should a backdoor ever be found in clang or one of the binutils tool, it's going to be 100% reproducible. And that is a big thing: being able to reproduce a backdoor is a godsend for security.

replies(1): >>43617954 #

4. lrvick ◴[07 Apr 25 23:30 UTC] No.43616980[source]▶

>>43616418 #

As the other (dead, but correct) commenter pointed out, job one is proving the released binary artifacts even match source code, as that is the spot that is most opaque to the public where vulns can most easily be injected (and have been in the past over and over and over).

Only with this problem solved, can we prove the code humans ideally start spending a lot more time reviewing (working on it) is actually the code that is shipped in compiled artifacts.

replies(1): >>43617409 #

5. no-dr-onboard ◴[07 Apr 25 23:36 UTC] No.43617025[source]▶

>>43615037 (TP) #

100% reproducible? That's amazing. I'll be honest, I don't really believe you (which I suppose is the point, right?).

Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.

What are you all using to verify commits? Are you guys verifying signatures against a public PKI?

Super interested as I manage the reproducibility program for a large software company.

replies(2): >>43618018 #>>43618056 #

6. floxy ◴[07 Apr 25 23:54 UTC] No.43617119[source]▶

>>43615037 (TP) #

>full source bootstrapped from 180 bytes of human-auditable machine code

What does this mean? You have a C-like compiler in 180 bytes of assembler that can compile a C compiler that can then compile GCC?

replies(2): >>43617264 #>>43617320 #

7. mananaysiempre ◴[08 Apr 25 00:29 UTC] No.43617264[source]▶

>>43617119 #

That’s normally what this means, yes, with a few more intermediate steps. There’s only one bootstrap chain like this that I know of[1,2,3], maintained by Jeremiah Orians and the Guix project; judging from the reference to 180 bytes, that’s what the distro GP describes is using as well.

> This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal of creating a bootstrapping path to a C compiler capable of compiling GCC, with only the explicit requirement of a single 1 KByte binary or less.

[1] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

[2] https://savannah.nongnu.org/projects/stage0/

[3] https://github.com/oriansj/bootstrap-seeds

replies(1): >>43617391 #

8. skulk ◴[08 Apr 25 00:40 UTC] No.43617320[source]▶

>>43617119 #

As per their landing page, yes.

> stage0: < 190 byte x86 assembly seed is reproduced on multiple distros

> stage1: seed builds up to a tiny c compiler, and ultimately x86 gcc

> stage2: x86 gcc bootstraps target architecture cross toolchains

very impressive, I want to try this out now.

replies(1): >>43618029 #

9. floxy ◴[08 Apr 25 00:56 UTC] No.43617391{3}[source]▶

>>43617264 #

That's pretty awesome

replies(1): >>43618005 #

10. charcircuit ◴[08 Apr 25 01:02 UTC] No.43617409{3}[source]▶

>>43616980 #

>can most easily be injected (and have been in the past over and over and over).

In practice this is much more rare then a user downloading and running malware or visiting a site that exploits their browser. Compare the number of 0days chrome has had over the years versus the number of times bad actors have hacked Google and replaced download links with links to malware.

replies(1): >>43617945 #

11. lrvick ◴[08 Apr 25 02:51 UTC] No.43617945{4}[source]▶

>>43617409 #

Nothing can stop users from being tricked, but normalizing the expectation of signing is our best defense. For instance, we trained users to start to expect the green lock, and started normalizing passkeys and fido2 which prove you are on the correct domain, taking phishing off the table.

Non-web software distribution, particularly for developers, has failed to mature significantly here. Most developers today use brew, nix, alpine, dockerhub, etc. None are signed in a way that allows end users to automatically prove they got artifacts that were faithfully and deterministically built from the expected source code. Could be malware, could be anything. The typical blind trust contract from developers to CDNs that host final compiled artifacts baffles me. Of course you will get malware this way.

Stagex by contrast uses OCI standard signing, meaning you can optionally set a containers/policy.json file in docker or whatever container runtime you use that will cause it to refuse to run any stagex images without reproduction signatures by two or more maintainers.

If you choose to, you can automatically rule out any single developer or system in the stagex chain from injecting malware into your projects.

replies(1): >>43623332 #

12. lrvick ◴[08 Apr 25 02:52 UTC] No.43617954{3}[source]▶

>>43616685 #

> OK, maybe not the Kubernetes one that ships only 12 binaries, forgot its name

You are likely thinking of Talos Linux, which incidentally also builds itself with stagex.

13. lrvick ◴[08 Apr 25 03:02 UTC] No.43618005{4}[source]▶

>>43617391 #

Yep, Guix and stagex are the only two distros that full source bootstrap to my knowleldge.

We use an abbreviated and explicit stage0 chain here for easy auditing: https://codeberg.org/stagex/stagex/src/branch/main/packages/...

replies(1): >>43618028 #

14. pabs3 ◴[08 Apr 25 03:05 UTC] No.43618018[source]▶

>>43617025 #

Read through these websites and LWN articles:

https://reproducible-builds.org/ https://bootstrappable.org/ https://bootstrapping.miraheze.org/ https://lwn.net/Articles/983340/ https://lwn.net/Articles/985739/

15. pabs3 ◴[08 Apr 25 03:06 UTC] No.43618024[source]▶

>>43616418 #

The code review problem is something solvable by something like CREV, where the developer community at large publishes the reviews they have done, and eventually there is good coverage of most things.

https://github.com/crev-dev/

16. pabs3 ◴[08 Apr 25 03:08 UTC] No.43618028{5}[source]▶

>>43618005 #

IIRC the FreeDesktop flatpak runtimes are also built from the Bootstrappable Builds folks full source bootstrap.

17. pabs3 ◴[08 Apr 25 03:09 UTC] No.43618029{3}[source]▶

>>43617320 #

The LWN article is a good place to start:

https://lwn.net/Articles/985739/

18. lrvick ◴[08 Apr 25 03:15 UTC] No.43618056[source]▶

>>43617025 #

Indeed you do not have to believe me.

> git clone https://codeberg.org/stagex/stagex

> cd stagex

> make

Several hours later your "out" directory will contain locally built OCI images for every package in the tree, and the index.json for each should contain the exact same digests we commit in the "digests" folder, and the same ones multiple maintainers sign in the OCI standard "signatures" folder.

We build with only a light make wrapper around docker today, though it assumes you have it configured to use the containerd image store backend, which allows for getting deterministic local digests without uploading to a registry.

No reason you cannot build with podman or kaniko etc with some tweaks (which we hope to support officially)

> Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.

We try to keep our package definitions to "FROM scratch" in "linux from scratch" style with no magic to be self documenting to be easy to audit or reference. By all means crib any of our tactics. We use no global env, so each package has only the determinism tweaks needed (if any). We heavily referenced Alpine, Arch, Mirage, Guix, Nix, and Debian to arrive at our current patterns.

> What are you all using to verify commits? Are you guys verifying signatures against a public PKI?

We all sign commits, reviews, and releases with well published PGP keys maintained in smartcards, with expected public keys in the MAINTAINERS file. Most of us have keyoxide profiles as well making it easy to prove all our online presences agree with the expected fingerprints for us.

> Super interested as I manage the reproducibility program for a large software company.

By all means drop in our matrix room, #stagex:matrix.org . Not many people working on these problems. The more we can all collaborate to unblock each other the better!

19. ◴[08 Apr 25 13:58 UTC] No.43621868[source]▶

>>43615037 (TP) #

20. charcircuit ◴[08 Apr 25 16:06 UTC] No.43623332{5}[source]▶

>>43617945 #

>Nothing can stop users from being tricked

But an operating system can limit the blast radius. Proper sandboxing is much more important than securing the supply chain.

replies(1): >>43626762 #

21. lrvick ◴[08 Apr 25 21:44 UTC] No.43626762{6}[source]▶

>>43623332 #

You can't have a secure sandbox on your workstation without a secure supply chain. Who builds your qemu or Xen binary or enclave image?

Maybe you mean sandboxes like secure enclaves. Almost every solution there builds non-deterministically with unsigned containers any of many maintainers can modify at any time, with minimal chance of detection. Maybe you have super great network monitoring, but if I compromise the CI/CD system to compile all binaries with a non-random RNG, then I can undermine any cryptography you use, and can re-create any sessions keys or secrets you can. Game over.

Qubes has the best sandboxing solution of any workstation OS, but that relies on Fedora which is not fully reproducible, and only signed via centralized single-party-controlled infrastructure. Threaten the right person and you can backdoor qubes and everyone that uses it.

I say this as a qubes user, because it is the least bad workstation sandboxing option we have. We must fix the supply chain to have server or workstation sandboxes we can trust.

By contrast, I help maintain airgapos, repros, and enclaveos which are each special purpose immutable appliance operating systems that function as sandboxes for cold key management, secure software builds, and remotely attestable isolated software respectively. All are built with stagex and deterministic so you should get the same hash from a local build any other maintainer has, proving your artifacts faithfully came from the easily reviewable sources.

replies(1): >>43629321 #

22. charcircuit ◴[09 Apr 25 06:05 UTC] No.43629321{7}[source]▶

>>43626762 #

>You can't have a secure sandbox on your workstation without a secure supply chain.

Yes, you can as they are independent things.

>Maybe you mean sandboxes like secure enclaves.

No I mean sandbox as in applications are sandboxed from the rest of the system. If you just run an application it shouldn't be able to encrypt all of your files. The OS should protect the rest of the system from potentially badly behaving applications.

>but if I compromise the CI/CD system to compile all binaries with a non-random RNG, then I can undermine any cryptography you use, and can re-create any sessions keys or secrets you can

In practice this is a much rarer kind of an attack. Investing a ton in strengthening the front door is meaningless when the backdoor is completely open. Attackers will attack the weakest link.

>Qubes has the best sandboxing solution of any workstation OS

Qubes only offers sandboxing between qubes.questions. There isn't sandboxing within a qube.

>proving your artifacts faithfully came from the easily reviewable sources.

Okay, but as mentioned previously those sources could have vulnerabilities or be malicous. Or users could run other software they have downloaded separately or via a curl | sh.

replies(1): >>43640857 #

23. lrvick ◴[10 Apr 25 05:15 UTC] No.43640857{8}[source]▶

>>43629321 #

> Yes, you can as they are independent things.

I sandbox everything in hypervisors, I get it, but you cannot trust a sandbox some internet rando built for you is actually sandboxing. You have to full source bootstrap your sandbox to be guaranteed that the compromise of any of hundreds of dev machines in the usual supply chains did not backdoor your hypervisor.

You need both.

> Attackers will attack the weakest link.

Agreed, and today that is supply chain attacks. I have done them myself in the wild, multiple times. Often as easy as buying an expired email domain of an awal maintainer and doing a password reset for github, dockerhub, godaddy, etc until you control a package in piles of supply chains. Or in the case of most Linux distros just go submit a couple bugfixes and apply to be a maintainer and you have official god access to push any code to major Linux distro supply chains with little to no oversight.

Cheap and effective attacks.

> Qubes only offers sandboxing between qubes.questions. There isn't sandboxing within a qube.

You are expected to run a distinct kernel and VM for each security context. The linux kernel is pretty shit at isolating trusted code from untrusted code on its own. Hypervisors are the only reliable sandbox we have so spin up tiny VMs for every workload.

> Okay, but as mentioned previously those sources could have vulnerabilities or be malicous.

Yes of course, and we need a community wide push to review all this code (working on it) but most of the time supply chain attacks are not even in the repos where someone might notice. They are introduced covertly in the release process of the source code tarballs, or in the final artifact generation flows, or in the CDNs that host those final artifacts. Then people review code, and assume that code is what generated final artifacts.

> Or users could run other software they have downloaded separately or via a curl | sh

Some users will always shoot themselves in the foot if they are uneducated on security, so that is a separate education problem. Supply chain attacks however will hit even users doing everything right, and often burn thousands of people at once. Those of us that maintain and distribute software are obligated to give users safe methods to prove software artifacts are faithfully generated from publicly accountable source code, teach them to not to trust any maintainers including us.

Education is the biggest problem on all sides here. For my part, every "curl | sh" I have ever encouraged users to run in the wild is a troll to teach users to never run those.