The provenance memory model for C

1. nikic ◴[30 Jun 25 20:45 UTC] No.44427669[source]▶

At least at a skim, what this specifies for exposure/synthesis for reads/writes of the object representation is concerning. One of the consequences is that dead integer loads cannot be eliminated, as they may have an exposure side effect. I guess C might be able to get away with it due to the interaction with strict aliasing rules. Still quite surprised that they are going against consensus here (and reduces the likelihood that these semantics will get adopted by implementers).

replies(4): >>44427836 #>>44428359 #>>44428989 #>>44432092 #

2. uecker ◴[30 Jun 25 21:01 UTC] No.44427836[source]▶

>>44427669 (TP) #

(Never mind, I misread you comment at first.) Yes, the representation access needs to be discussed... I took a couple of years to publish this document. More important would be if the ptr2int exposure could be implemented.

3. comex ◴[30 Jun 25 21:59 UTC] No.44428359[source]▶

>>44427669 (TP) #

> I guess C might be able to get away with it due to the interaction with strict aliasing rules.

But not for char-typed accesses. And even for larger types, I think you would have to worry about the combo of first memcpying from pointer-typed memory to integer-typed memory, then loading the integer. If you eliminate dead integer loads, then you would have to not eliminate the memcpy.

replies(1): >>44437436 #

4. ben0x539 ◴[30 Jun 25 23:24 UTC] No.44428989[source]▶

>>44427669 (TP) #

Can you say more about what the consensus is that this is going against?

replies(1): >>44437131 #

5. alextingle ◴[01 Jul 25 09:27 UTC] No.44432092[source]▶

>>44427669 (TP) #

I don't imagine that the exposed state would need to be represented in the final compiler output, so the optimiser could mark the pointer as exposed, but still eliminate the dead integer load.

Or from a pragmatic viewpoint, perhaps if the optimiser eliminates a dead load, then don't mark the pointer as exposed? After all, the whole point is to keep track of whether a synthesised pointer might potentially refer to the exposed pointer's storage. There's zero danger of that happening if the integer load never actually occurs.

replies(1): >>44437878 #

6. nikic ◴[01 Jul 25 19:14 UTC] No.44437131[source]▶

>>44428989 #

That type punning through memory does not expose or synthesize memory. There are some possible variations on this, but the most straightforward is that pointer to integer transmutes just return the address (without exposure) and integer to pointer transmutes return a pointer with nullary provenance.

7. nikic ◴[01 Jul 25 19:52 UTC] No.44437436[source]▶

>>44428359 #

That's a great point. I initially thought we could assume no exposure for loads with non-pointer-compatible TBAA, but you are right that this is not correct if the memory has been laundered through memcpy.

replies(1): >>44440646 #

8. Hercuros ◴[01 Jul 25 20:48 UTC] No.44437878[source]▶

>>44432092 #

I guess the internal exposure state would be “wrong” if the compiler removes the dead load (e.g in a pass that runs before provenance analysis).

However, if all of the program paths from that point onward behave the same as if the pointer was marked as exposed, that would be fine. It’s only “wrong” to track the incorrect abstract machine state when that would lead to a different behaviour in the abstract machine.

In that sense I suppose it’s no different from things like removing a variable initialisation if the variable is never used. That also has a side effect in the abstract machine, but it can still be optimised out if that abstract machine side effect is not observable.

9. uecker ◴[02 Jul 25 06:02 UTC] No.44440646{3}[source]▶

>>44437436 #

You can still eliminate the memcpy of if you mark the pointer exposed at this point.