Most active commenters
  • lrvick(8)
  • fuhsnn(5)

←back to thread

182 points yarapavan | 21 comments | | HN request time: 0.216s | source | bottom
Show context
neuroelectron ◴[] No.43616167[source]
Very suspicious article. Sounds like the "nothing to see here folks, move along" school of security.

Reproducibility is more like a security smell; a symptom you’re doing things right. Determinism is the correct target and subtly different.

The focus on supply chain is a distraction, a variant of The “trusting trust” attack Ken Thompson described in 1984 is still among the most elegant and devastating. Infected development toolchains can spread horizontally to “secure” builds.

Just because it’s open doesn’t mean anyone’s been watching closely. "50 years of security"? Important pillars of OSS have been touched by thousands of contributors with varying levels of oversight. Many commits predate strong code-signing or provenance tracking. If a compiler was compromised at any point, everything it compiled—including future versions of itself—could carry that compromise forward invisibly. This includes even "cleanroom" rebuilds.

replies(4): >>43616257 #>>43617725 #>>43621870 #>>43622202 #
1. lrvick ◴[] No.43616257[source]
The best defense we have against the Trusting Trust attack is full source bootstrapping, now done by two distros: Guix and Stagex.
replies(2): >>43616330 #>>43625793 #
2. AstralStorm ◴[] No.43616330[source]
No you do not. If you have not actually validated each and every source package your trust is only related to the generated binaries corresponding to the sources you had. The trusting trust attack was deployed against the source code of the compiler, poisoning specific binaries. Do you know if GCC 6.99 or 7.0 doesn't put a backdoor in some specific condition?

There's no static or dynamic analysis deployed to enhance this level of trust.

The initial attempts are simulated execution like in valgrind, all the sanitizer work, perhaps difference on the functional level beyond the text of the source code where it's too easy to smuggle things through... (Like on an abstracted conditional graph.)

We cannot even compare binaries or executables right given differing compiler revisions.

replies(4): >>43616446 #>>43616959 #>>43617254 #>>43618041 #
3. neuroelectron ◴[] No.43616446[source]
Besides full source boostrapping which could adopt progressive verification of hardware features and assumption of untrusted hardware, integration of Formal Verification into the lowest levels of boostrapping is a must. Bootstap security with the compiler.

This won't protect against more complex attacks like RoP or unverified state. For that we need to implement simple artifacts that are verifiable and mapped. Return to more simple return states (pass/error). Do error handling external to the compiled binaries. Automate state mapping and combine with targeted fuzzing. Systemd is a perfect example of this kind of thing, what not to do: internal logs and error states being handled by a web of interdependent systems.

replies(1): >>43616594 #
4. AstralStorm ◴[] No.43616594{3}[source]
RoP and unverified state would at least be highlighted by such an analysis. Generally it's a lot of work and we cannot quite trust fully automated systems to keyword it to us... Especially when some optimizer changes between versions of the compiler. Even a single compile flag can throw the abstract language upside down, much less the execution graph...

Fuzzing is good but probabilistic. It is unlikely to hit on a deliberate backdoor. Solid for finding bugs though.

replies(1): >>43616968 #
5. lrvick ◴[] No.43616959[source]
So for example, Google uses a goobuntu/bazel based toolchain to get their go compiler binaries.

The full source bootstrapped go compiler binaries in stagex exactly match the hashes of the ones Google releases, giving us as much confidence as we can get in the source->binary chain, which until very recently had no solution at all.

Go has unique compiler design choices that make it very self contained that make this possible, though we also can deterministically build rust, or any other language from any OCI compatible toolchain.

You are talking about one layer down from that, the source code itself, which is our next goal as well.

Our plan is this:

1. Be able to prove all released artifacts came from hash locked source code (done)

2. Develop a universal normalized identifier for all source code regardless of origin (treehash of all source regardless of git, tar file etc, ignoring/removing generated files, docs, examples, or anything not needed to build) (in progress)

3. Build distributed code review system to coordinate the work to multiple signed reviews by reputable security researchers for every source package by its universal identifier (planning stages)

We are the first distro to reach step 1, and have a reasonably clear path to steps 2 and 3.

We feel step 2 would be a big leap forward on its own, as it would have fully eliminated the xz attack where the attack hid in the tar archive, but not the actual git tree.

Pointing out these classes of problem is easy. I know, did it for years. Actually dramatically removing attack surface is a lot more rewarding.

Help welcome!

6. lrvick ◴[] No.43616968{4}[source]
I agree here. Use automated tools to find low hanging fruit, or mistakes.

There is unfortunately no substitute for a coordinated effort to document review by capable security researchers on our toolchain sources.

7. rcxdude ◴[] No.43617254[source]
That's a different problem. The threat in Trusting Trust is that the backdoor may not ever appear in public source code.
8. pabs3 ◴[] No.43618041[source]
Code review systems like CREV are the solution to backdoors being present in public source code.

https://github.com/crev-dev/

9. egberts1 ◴[] No.43625793[source]
Gentoo is a full source boostrapping if you include the build of GRUB2 and create the initramd file as well as the kernel.
replies(1): >>43626690 #
10. lrvick ◴[] No.43626690[source]
Full source bootstrapping meaning you build with 100% human auditable source code or machine code. The only path to do this today I am aware of is via hex0 building up to Mes and tinycc on up to a modern c compiler: https://github.com/fosslinux/live-bootstrap/blob/master/part...

As far as I know Gentoo, even from their "stage0" still assumes you bring your own bootstrap compiler toolchain, and thus is not self bootstrapping.

replies(1): >>43627808 #
11. fuhsnn ◴[] No.43627808{3}[source]
The fosslinux/live-bootstrap project is more about bootstrapping from minimal binary seed than auditability, for the latter case I'd argue that having a readable C cross-compiler is clearer than going through multiple steps involving several programming or scripting languages.
replies(1): >>43640744 #
12. lrvick ◴[] No.43640744{4}[source]
But how do you build that readable c cross compiler?

Full source bootstrapping is our only way out of the trusting trust problem

replies(1): >>43641210 #
13. fuhsnn ◴[] No.43641210{5}[source]
You bootstrap the compiler with itself and audit whether the compiler binary is exactly the same semantics as its source.

>Full source bootstrapping is our only way out of the trusting trust problem

No, that is just deferring the trust to all the tools and scripts that fosslinux/live-bootstrap project provides.

replies(1): >>43643157 #
14. akoboldfrying ◴[] No.43643157{6}[source]
> You bootstrap the compiler with itself

To be able to do this, you must already have both the source for the compiler and what someone has told you is a binary compiled from it. But what if that someone was lying?

replies(1): >>43645100 #
15. fuhsnn ◴[] No.43645100{7}[source]
Not a programmer, are you? Programmers can fully investigate the compiled binary without anyone even has a chance to lie to them. If a team don't have the ability to audit the decompilation of a 10k LOC C compiler at least once, I doubt their chance against a backdoor hidden in the 100s of steps of https://github.com/fosslinux/live-bootstrap/blob/master/part...
replies(2): >>43647215 #>>43649502 #
16. lrvick ◴[] No.43647215{8}[source]
Not everyone that programs is versed in decompiling, digital forensics, reverse engineering, etc.

Anyway, so your means of forming trust in a compiler faithfully compiling code, is to trust a decompiler to faithfully generate human readable source code followed by a lot of manual review labor repeated by every user that wishes to distrust the maintainers.

Okay, but a decompiler could be backdoored as easily as a compiler to hide malicious code vs inject it .

How do you get a decompiler you trust more than the compiler you are reviewing? Do you decompile the decompiler with itself? Back at the trusting trust problem.

Decompilers are way more complex than anything in the hex0->tinycc bootstrap path.

replies(1): >>43649504 #
17. akoboldfrying ◴[] No.43649502{8}[source]
As another commenter observed, having to trust a decompiler doesn't reduce the amount of trust you need to provide, it increases it. Reducing the amount of trust is our high-level goal, remember?

But let's not focus too hard on the logic side of your argument. The part that really convinced everyone that you're right was your opening statement, "Not a programmer, are you?". From that moment it was clear that you were taking the discussion to a higher plane, far above boring everyday logic.

Like a superhero, really. At least, that's how I picture you.

replies(1): >>43649687 #
18. fuhsnn ◴[] No.43649504{9}[source]
> Anyway, so your means of forming trust in a compiler faithfully compiling code, is to trust a decompiler to faithfully generate human readable source code

No, it is to fully audit the binary of a compiler itself, if you don't trust a decompiler, learn to read machine code, the output from a simple C compiler tend to pretty predictable.

> manual review labor repeated by every user that wishes to distrust the maintainers.

Yes? What's wrong with that? Anyone wishes to distrust, you give them the tools and knowledge to verify the process, the more people able to do this the better.

replies(1): >>43671090 #
19. fuhsnn ◴[] No.43649687{9}[source]
That was the response to your "what someone has told you is a binary" argument, if you learnt the basics of programming, you will know it's just a hexdump away to verify a binary, there's no one else in the room to tell you anything, you hit compile and see the result yourself, it's simple, direct and intimate. Yeah you could say it feels like a superpower, and it's a skill everyone can learn.
replies(1): >>43671128 #
20. lrvick ◴[] No.43671090{10}[source]
It is going to be a heroic shared win of the entire community if we get people to even do basic review of dependencies in languages where we have the actual source code. Trying to get people to ignore the source code and actually decompile and review every binary they use on every computer they use, including the decompiler somehow, is a lost cause.

We should expect only a few people will review code, if it is drive-by easy to do. That means proving the binaries for sure came from the published commented formatted code, and then go review that code.

21. lrvick ◴[] No.43671128{10}[source]
So you just dump hex and know exactly what a program does, and can quickly understand if uses good entropy sources, uses good cryptography choices, etc in the same amount of time or less time than you could read the published source code to verify the same?

If you can do that, you are the only one alive that can.