Memory Integrity Enforcement

1. brcmthrowaway ◴[09 Sep 25 20:07 UTC] No.45188066[source]▶

How does this compare to CHERI?

2. bri3d ◴[09 Sep 25 20:44 UTC] No.45188739[source]▶

Substantially less complex and therefore likely to be substantially easier to actually use.

CHERI-Morello uses 129-bit capability objects to tag operations, has a parallel capability stack, capability pointers, and requires microarchitectural support for a tag storage memory. Basically with CHERI-Morello, your memory operations also need to provide a pointer to a capability object stored in the capability store. Everything that touches memory points to your capability, which tells the processor _what_ you can do with memory and the bounds of the memory you can touch. The capability store is literally a separate bus and memory that isn't accessible by programs, so there are no secrets: even if you leak the pointer to a capability, it doesn't matter, because it's not in a place that "user code" can ever touch. This is fine in theory, but it's incredibly expensive in practice.

MIE is a much simpler notion that seems to use N-bit (maybe 4?) tags to protect heap allocations, and uses the SPTM to protect tag space from kernel compromise. If it's exactly as in the article: heap allocations get a tag. Any load/store operation to the heap needs to provide the tag that was used for their allocation in the pointer. The tag store used by the kernel allocator is protected by SPTM so you can't just dump the tags.

If you combine MIE, SPTM, and PAC, you get close-ish to CHERI, but with independent building blocks. It's less robust, but also a less granular system with less overhead.

MIE is both probabilistic (N-bits of entropy) and protected by a slightly weaker hardware protection (SPTM, which to my understanding is a bus firewall, vs. a separate bus). It also only protects heap allocations, although existing mitigations protect the stack and execution flow.

Going off of the VERY limited information in the post, my naive read is that the biggest vulnerability here will be tag collision. If you try enough times with enough heap spray, or can groom the heap repeatedly, you can probably collide a tag with however many bits of entropy are present in the system. But, because the model is synchronous, you will bus fault every time before that, unlike MTE, so you'll get caught, which is a big problem for nation-state attackers.

replies(4): >>45189259 #>>45191970 #>>45192031 #>>45192047 #

3. ysnp ◴[09 Sep 25 20:47 UTC] No.45188796[source]▶

>>45188066 (TP) #

https://saaramar.github.io/memory_safety_blogpost_2022/ is a nice article which goes into this topic for MTE in the past.

replies(1): >>45188881 #

4. bri3d ◴[09 Sep 25 20:52 UTC] No.45188881[source]▶

>>45188796 #

And of note, the Apple implementation basically forces the invariants documented in the author's talk:

* use synchronous exceptions (“precise-mode”), which means the faulted instruction cannot retire and cause damage

* re-tag allocations on free

5. leoc ◴[09 Sep 25 21:13 UTC] No.45189259[source]▶

>>45188739 #

Something I'm not clear about: is CHERI free and clear in patent terms, or do people have their hands out grasping for an MPEG-like licensing bonanza? If it's the latter then that might matter as much as purely technical obstacles to CHERI adoption.

replies(1): >>45192075 #

6. strcat ◴[10 Sep 25 01:30 UTC] No.45191970[source]▶

>>45188739 #

The early ARM Cortex MTE support has full support for synchronous and asymmetric (synchronous on reads, asynchronous on write) modes. Asynchronous was near zero cost and asymmetric comparable to a mitigation like MTE. This has been available since the launch of the Pixel 8 for Android. GrapheneOS began using it in the month the Pixel 8 launched after integrating it into hardened_maloc. It currently uses mode synchronous for the kernel and asymmetric for userspace. EMTE refers to FEAT_MTE4 which is a standard ARM extension with the 4th round of MTE features. It isn't Apple specific.

MTE is 4 bits with 16 byte granularity. There's usually at least 1 tag reserved so there are 15 random tags. It's possible to dynamically exclude tags to have extra deterministic guarantees. GrapheneOS excludes the previous random tag and adjacent random tags so there are 3 dynamically excluded tags which were themselves random.

Linux kernel MTE integration for internal usage is not very security focused and has to be replaced with a security-focused implementation integrated with pKVM at some point. Google's recently launched Advanced Protection feature currently doesn't use kernel MTE.

7. astrange ◴[10 Sep 25 01:41 UTC] No.45192031[source]▶

>>45188739 #

SPTM isn't a hardware feature, it's basically a hypervisor that manages the page tables and tag memory so that the kernel doesn't own its own tags.

8. jrtc27 ◴[10 Sep 25 01:43 UTC] No.45192047[source]▶

>>45188739 #

> has a parallel capability stack

There is one stack, the normal program stack that's normal main memory.

> capability pointers

If you use pure-capability CHERI C/C++ then there is only one type of pointer to manage; they just are implemented as capabilities rather than integers. They're also just extensions of the existing integer registers; much as 64-bit systems extend 32-bit registers, CHERI capability registers extend the integer registers.

> requires microarchitectural support for a tag storage memory

Also true of MTE?

> your memory operations also need to provide a pointer to a capability object stored in the capability store

There is no "capability object stored in the capability store". The capability is just a thing that lives in main memory that you provide as your register operand to the memory instruction. Instead of `ldr x0, [x1]` to load from the address `x1` into `x0`, you do `ldr x0, [c1]` to load from the capability `c1`. But `c1` has all of the capability; there is no indirection. It sounds like you are thinking of classical capability systems that did have that kind of indirection, but an explicit design goal of CHERI is to not do that in order to be much more aligned with contemporary microarchitecture.

> The capability store is literally a separate bus and memory that isn't accessible by programs,

As above, there is no separate bus, and capabilities are not in separate memory. Everything lives in main memory and is accessed using the same bus. The only difference is there are now capability tags being stored alongside that data, with different schemes possible (wider SRAM, DRAM ECC bits, carving out a bit of main memory so the memory controller can store tags there and pretend to the rest of the system that memory itself stores tags). To anything interacting with the memory subsystem, there is one bus, and the tags flow with the data on it.

replies(1): >>45193452 #

9. jrtc27 ◴[10 Sep 25 01:47 UTC] No.45192075{3}[source]▶

>>45189259 #

Cambridge and Arm have made a joint statement that nothing that is essential to the deployment of CHERI ("capability essential IP") is being patented by them: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-953.pdf. As with any patent issues, you should consult your legal team and not take anyone else's word for it, because patent law is a minefield and who knows what patents may be out there lurking that nobody realises happens to cover some aspect of CHERI, or design choices in an implementation of it, as with any processor technology, but we are not out to patent it. We believe that the right thing to do is to make the technology open in order to allow it to be widely used for the good of the field.

10. bri3d ◴[10 Sep 25 04:56 UTC] No.45193452{3}[source]▶

>>45192047 #

> To anything interacting with the memory subsystem, there is one bus, and the tags flow with the data on it.

To the architecture, there is one access mechanism with the tag bit set and one separate mechanism with the tag bit unset, no?

I thought this was the whole difference: in MTE, there is a secret tag hidden in a “normal” pointer by the allocator, and in CHERI, there is a separate architectural route for tag=0 (normal memory) and tag=1 (capabilities memory), whether that separate route eventually goes to some partition of main memory, a separate store entirely, ECC bit stuffing, or whatever?

replies(1): >>45194513 #

11. jrtc27 ◴[10 Sep 25 07:43 UTC] No.45194513{4}[source]▶

>>45193452 #

No. The capability itself lives in normal memory intermingling with data just like any other pointer. There is no "capabilities memory", it is just memory.

In MTE, you have the N-bit (typically 4) per-granule (typically 16 byte) "colour"/tag that is logically part of the memory but the exact storage details are abstracted by the implementation. In CHERI, you have the 1-bit capability tag that is logically part of the memory but the exact storage details are abstracted by the implementation. If you understand how MTE is able to store the colours to identify the different allocations in memory (the memory used for the allocations, not the pointers to the allocations) then you understand how CHERI stores the tags for its capabilities, because they are the same basic idea. The difference comes in how they're used: in MTE, they identify the allocation, which means you "paint" the whole allocation with the given "colour" at allocation time (malloc, new, alloca / stack variables, load time for globals), but in CHERI, they identify valid capabilities, and so only get set when you write a valid capability to that memory location (atomically and automatically). This leads to very different access patterns and densities (e.g. MTE must tag all data regardless of its type, whereas CHERI only tags pointers, meaning large chunks of plain data have large chunks of zero tag bits, so how you optimise your microarchitecture changes).

Perhaps you're getting confused with details about the "tag table + cache" implementation for how tags can be stored in commodity DRAM? For CHERI you really want 129-bit word (or some multiple thereof) memory, but commodity DRAM doesn't give you that. So as part of the memory controller (or just in front of it) you can put a "tag controller" which hides a small (< 1%) fraction of the memory and uses it to store the tags for the rest of the memory, with various caching tricks to make it go fast. But that is just the tag, and that is an implementation detail for how to pretend that your memory can tag data. You could equally have an implementation that uses wider DRAM (e.g. in the case of DRAM with ECC bits to spare). Both schemes have been implemented. But importantly memory is just 128+1-bit; the same 128 bits always store the data, whether it's some combination of integers and floats, or the raw bytes of a capability. In the former case, the 129th tag bit will be kept as 0, and in the latter case it will be kept as whatever the capability's tag is (hopefully 1).