Most active commenters
  • bri3d(6)
  • pizlonator(6)
  • jrtc27(6)
  • quotemstr(3)

←back to thread

Memory Integrity Enforcement

(security.apple.com)
458 points circuit | 32 comments | | HN request time: 0.001s | source | bottom
Show context
tptacek ◴[] No.45186809[source]
Both approaches revealed the same conclusion: Memory Integrity Enforcement vastly reduces the exploitation strategies available to attackers. Though memory corruption bugs are usually interchangeable, MIE cut off so many exploit steps at a fundamental level that it was not possible to restore the chains by swapping in new bugs. Even with substantial effort, we could not rebuild any of these chains to work around MIE. The few memory corruption effects that remained are unreliable and don’t give attackers sufficient momentum to successfully exploit these bugs.

This is great, and a bit of a buried lede. Some of the economics of mercenary spyware depend on chains with interchangeable parts, and countermeasures targeting that property directly are interesting.

replies(3): >>45188753 #>>45190761 #>>45191353 #
1. leoc ◴[] No.45188753[source]
In terms of Apple Kremlinology, should this be seen a step towards full capability-based memory safety like CHERI ( https://en.wikipedia.org/wiki/Capability_Hardware_Enhanced_R... ) or more as Apple signaling that it thinks it can get by without something like CHERI?
replies(2): >>45189145 #>>45189370 #
2. bri3d ◴[] No.45189145[source]
IMO it's the latter; CHERI requires a lot of heavy lifting at the compile-and-link layer that restricts application code behaviors, and an enormous change to the microarchitecture. On the other hand, heap-cookies / tag secrets can be delegated to the allocator at runtime in something like MIE / MTE, and existing component-level building blocks like the SPTM can provide some of the guarantees without needing a whole parallel memory architecture for capabilities like CHERI demands.
replies(3): >>45192088 #>>45194072 #>>45194123 #
3. pizlonator ◴[] No.45189370[source]
MTE and CHERI are so different that it’s hard and maybe not even possible to do both at the same time (you might not have enough spare bits in a CHERI 128 bit ptr for the MTE tag)

They also imply a very different system architecture.

replies(3): >>45189425 #>>45189430 #>>45191990 #
4. quotemstr ◴[] No.45189425[source]
> MTE and CHERI are so different that it’s hard and maybe not even possible to do both at the same time (you might not have enough spare bits in a CHERI 128 bit ptr for the MTE tag)

Why would you need MTE if you have CHERI?

replies(2): >>45189480 #>>45189490 #
5. leoc ◴[] No.45189430[source]
Sure, I'm not suggesting that Apple might actually do both at the same time. They could however implement the less burdensome one now while intending to replace it with the the all-singing-all-dancing alternative down the line.
replies(1): >>45189462 #
6. pizlonator ◴[] No.45189462{3}[source]
Gotcha. My point about different systems architectures makes me think it’s unlikely that you’d want to do that
7. pizlonator ◴[] No.45189480{3}[source]
Not saying you’d want both. Just answering why MTE isn’t a path to CHERI

But here’s a reason to do both: CHERI’s UAF story isn’t great. Adding MTE means you get a probabilistic story at least

replies(2): >>45189519 #>>45189544 #
8. bri3d ◴[] No.45189490{3}[source]
Why would you need CHERI if you have working mitigations that don't demand a second bus?

I think it's two halves of the same coin and Apple chose the second half of the coin.

The two systems are largely orthogonal; I think if Apple chose to go from one to the other it will be a generational change rather than an incremental one. The advantage of MTE/MIE is you can do it incrementally by just changing the high bits the allocator supplies; CHERI requires a fundamental paradigm shift. Apple love paradigm shifts but there's no indication they're going to do one here; if they do, it will be a separate effort.

replies(2): >>45189560 #>>45189567 #
9. quotemstr ◴[] No.45189519{4}[source]
Some progress on UAF though! https://dl.acm.org/doi/10.1145/3703595.3705878
10. bri3d ◴[] No.45189544{4}[source]
True! On the flip side, MTE sucks at intra-object corruption: if I get access to a heap object with pointers, MTE doesn't affect me, I can go ahead and write to that object because I own the tag.

Overall my _personal_ opinion is that CHERI is a huge win at a huge cost, while MTE is a huge win at a low cost. But, there are definitely vulnerability classes that each system excels at.

replies(1): >>45189587 #
11. als0 ◴[] No.45189560{4}[source]
Second bus?
replies(1): >>45189628 #
12. pizlonator ◴[] No.45189567{4}[source]
CHERI is deterministic.

That’s strictly better, in theory.

(Not sure it’s practically better. You could make an argument that it’s not.)

replies(2): >>45189645 #>>45193366 #
13. pizlonator ◴[] No.45189587{5}[source]
I think the intra object issue might be niche enough to not matter.

And CHERI fixes it only optionally, if you accept having to change a lot more code

replies(2): >>45189613 #>>45192137 #
14. bri3d ◴[] No.45189613{6}[source]
I think I broadly agree with you. IMO tagging is practically much, much more valuable than capabilities systems modeled like CHERI.
replies(1): >>45189721 #
15. bri3d ◴[] No.45189628{5}[source]
CHERI fundamentally relies on capabilities living in memory that is architecturally separate from program memory. You could do so using a bus firewall, but then you're at the same place as MIE with the SPTM.
replies(3): >>45190908 #>>45192005 #>>45194952 #
16. bri3d ◴[] No.45189645{5}[source]
FWIW (I am a nobody compared to you; I didn't make FIL-C :) ) - I think that MIE/MTE are practically superior to CHERI.

I also think this argument is compelling because one exists in millions of consumer drives, to-be-more (MTE -> MIE) and one does not.

17. quotemstr ◴[] No.45189721{7}[source]
Yes, but CHERI opens whole new system design possibilities, including things like ultra-cheap intra-address-space security boundaries. See https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201607...

> We have used CHERI’s ISA facilities as a foundation to build a software object-capability model supporting orders of magnitude greater compartmentalization performance, and hence granularity, than current designs. We use capabilities to build a hardware-software domain-transition mechanism and programming model suitable for safe communication between mutually distrusting software

and https://github.com/CTSRD-CHERI/cheripedia/wiki/Colocation-Tu...

> Processes are Unix' natural compartments, and a lot of existing software makes use of that model. The problem is, they are heavy-weight; communication and context switching overhead make using them for fine-grained compartmentalisation impractical. Cocalls, being fast (order of magnitude slower than a function call, order of magnitude faster than a cheapest syscall), aim to fix that problem.

This functionality revolves around two functions: cocall(2) for the caller (client) side, and coaccept(2) for the callee (service) side. Underneath they are implemented using CHERI magic in the form of CInvoke / LDPBR CPU instruction to switch protection domains without the need to enter the kernel, but from the API user point of view they mostly look like ordinary system calls and follow the same conventions, errno et al.

There's a decent chance that we get back whatever performance we pay for CHERI with interest as new systems architecture possibilities open up.

MTE helps us secure existing architectures. CHERI makes new architectures possible.

replies(1): >>45190296 #
18. saagarjha ◴[] No.45190296{8}[source]
Yes, but this breaks mirror mappings.
replies(1): >>45192097 #
19. MBCook ◴[] No.45190908{6}[source]
So something like having built in RAM for the pagetables that aren’t part of the normal pool? That way no matter what kind of attack you come up with user space cannot pass a pointer to it?
20. jrtc27 ◴[] No.45191990[source]
We actually have ideas for how to combine the two; see section C.5 of https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-987.pdf
21. jrtc27 ◴[] No.45192005{6}[source]
That's not true. Capabilities are in main memory as much as any other data. The tags are in separate memory (whether a wider SRAM, DRAM ECC bits, or a separate table off on the side in a fraction of memory that's managed by the memory controller; all three schemes have been implemented and have trade-offs). But this is also true of MTE; you do not want those tags in normal software-visible main memory either, they need to be protected.
22. jrtc27 ◴[] No.45192088[source]
To reiterate what I've said elsewhere, CHERI does not need a whole parallel memory architecture, there is just one that gets a slight extension over a non-CHERI/MTE system to include tags. But that is the same story as MTE, which also needs to propagate the tags in the memory system (and in fact, more tags, since we just need one bit per 16 bytes, whereas MTE needs 4 bits per 16 bytes in the common scheme).
23. jrtc27 ◴[] No.45192097{9}[source]
Can you elaborate on what you perceive as broken?
replies(1): >>45194424 #
24. jrtc27 ◴[] No.45192137{6}[source]
Where studies suggest "a lot" is sub-0.1%. For example, https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f2... was a study into porting 6 million lines of C and C++ to run a KDE+X11 desktop stack on CHERI, and saw 0.026% LoC change, or ~1.5k LoC out of ~6 million LoC, all done in just 3 months by one person. That's even an overestimate, because it includes many changes to build systems just to be able to cross-compile the projects. It's not nothing, but it's the kind of thing where a single engineer can feasibly port large bodies of code. Yes, certain systems code will be worse (like JITs), but the vast majority of cases are not that, and even those are still feasible (e.g. we have people working with Chromium and V8).
replies(1): >>45197116 #
25. VogonPoetry ◴[] No.45193366{5}[source]
This is on the verge of pedantry - CHERI determinism isn't strictly true, garbage collecting abandoned descriptors is currently done asynchronously. Malicious code could attempt to reuse an abandoned descriptor before it is "disappeared". I think it might be possible to construct a synthetic situation where two threads operating with perhaps different privilege in the same address space (something CHERI can support!) have an IPC channel might be affected by the timing.

There is a section in the technical reports that talks about garbage collection.

I don't think CHERI is currently being used with different privileged threads in the same address space.

replies(1): >>45195078 #
26. mschuster91 ◴[] No.45194072[source]
> CHERI requires a lot of heavy lifting at the compile-and-link layer that restricts application code behaviors, and an enormous change to the microarchitecture.

Well, Apple already routinely forces developers to recompile their applications so if Apple wants to introduce something needing a compiler / toolchain update they can do that easily. And they also control the entire SoC from start to finish and unlike pretty much everyone else also hold an ARM Architecture License so they can go and change whatever they want in the hardware side as well.

27. checker659 ◴[] No.45194123[source]
> compile-and-link layer

Not to mention the dynamic linker.

replies(1): >>45194569 #
28. saagarjha ◴[] No.45194424{10}[source]
mremap?
29. jrtc27 ◴[] No.45194569{3}[source]
Yeah you need a compiler, linker and OS. That's true of any security technology. CHERI may be more significant in that regard because it's a bigger rethink than just stuffing some extra metadata into the existing types, but it's not at all intractable. We, a research group, maintain CheriBSD, a "full-fat" port of FreeBSD to CHERI (Morello and CHERI-RISC-V), so to a big tech organisation it's a small investment. The cost to tech companies is not making it work, it's often much more boring business factors.
30. Findecanor ◴[] No.45194952{6}[source]
A CHERI capability is stored in main memory but with the tag bit for that location set. The tags are stored in separate memory pages, also in main memory in current designs.

Maybe you've been confused by a description of how it works inside a processor. In early CHERI designs, capabilities were in different architectural processor registers from integers.

In recent CHERI designs, the same register numbers are used for capabilities and other registers. A micro-architecture could be designed to have either all registers be capability registers with the tag bit, or use register renaming to separate integer and capability registers.

I suppose a CHERI MCU for embedded systems with small memory could theoretically have tag pages in separate SRAM instead of caching main memory, but I have not seen that.

31. Findecanor ◴[] No.45195078{6}[source]
I suspect that the parent poster was referring to MTE's memory protection being probabilistic. There are only 16 tag values for an attacker to guess. You can combine MTE and PAC, but PAC is also only probabilistic.

With CHERI, there is nothing to guess. You either have a capability or you don't.

32. pizlonator ◴[] No.45197116{7}[source]
Does that study include enabling intra object overflow protection, or not?

When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection.

Finally, 6 million lines of code is not that impressive. Real OSes are measured in billions