←back to thread

146 points returningfory2 | 3 comments | | HN request time: 0s | source
Show context
packetlost ◴[] No.43645287[source]
I don't think it's actually "flattening" the enums discriminant spaces (though maybe it is). My guess is this is just 32-bit alignment requirement (ie. bytes 1:4 are padding) + sub-byte discriminant sizes. The surprising thing here is that the ordering of Outer's variants doesn't match the ordering as defined, instead having variant D's discriminant be 0 (0 << size_of::<Inner's discriminant>). size_of::<Inner> is actually 33 bits and size_of::<Outer> is 34 bits and then you just apply alignment requirements to each field. Actual size_of calls will account for alignment and padding, but the compiler knows the truth.

What's cool about this in general is nested match statements can be flattened into a jumplist (idk if rustc does this, but it's possible in theory).

replies(2): >>43646587 #>>43649004 #
LegionMammal978 ◴[] No.43649004[source]
> I don't think it's actually "flattening" the enums discriminant spaces (though maybe it is).

It is. One easy way to see this is with an Option<Option<bool>> [0]: if both options are Some, it takes the value 0 or 1 depending on the boolean; if the inner Option is None, it takes the value 2; and if the outer Option is None, it takes the value 3. And as you keep adding more Options, they take values 4, 5, 6, etc. to represent None, while still only taking up 1 byte.

[0] https://play.rust-lang.org/?version=stable&mode=debug&editio...

replies(1): >>43650162 #
packetlost ◴[] No.43650162[source]
I think that's just niche optimization. If you change from bool to a u8 it doesn't use the invalid bit pattern as a discriminant even though it could: https://play.rust-lang.org/?version=stable&mode=debug&editio...
replies(2): >>43650492 #>>43650496 #
1. LegionMammal978 ◴[] No.43650492[source]
What invalid bit pattern? A u8 can be anything from 0 to 255, so the Option necessarily has to put its discriminant into another byte. If you replace it with a NonZeroU8, then the compiler will duly use the forbidden 0 value for the first Option level, and a separate byte for all further levels.

(Granted, in the None variant, the byte used for the u8 is not usable, but if we're already using a separate discriminant byte, 256 variants should be plenty.)

replies(1): >>43654032 #
2. packetlost ◴[] No.43654032[source]
The same way as it does in the bool case? The u8 bits are invalid if either of the Options are None, but in particular if the Outer option is Some the Inner is None the bit that would otherwise be used for the bool (in the first example) is used to discriminate Outer, but doesn't do so in the case of the u8.
replies(1): >>43657240 #
3. LegionMammal978 ◴[] No.43657240[source]
I don't really get what distinction you're trying to draw between niche optimization and the bitwise stuff you keep talking about. As far as I am aware, the Rust compiler never works by counting bits in any context. The LLVM backend sometimes does, but it's not responsible for enum layout.

The mental model is, an enum payload will have some number of integer/pointer values with niches in their representation. Niches don't work by counting bits, they work as numeric ranges. E.g., a char is just a u32 with values 0 through 0x10ffff, a bool is a u8 with values 0 and 1, a reference is just a pointer with any value except 0, etc., and the niche is precisely the negation of this range.

Sometimes the niche corresponds to bits used in a valid value (e.g., the 0 value of a NonZeroU8), and sometimes it corresponds to other bits (e.g., values 2 through 255 of a bool): the compiler only cares about the ranges, not the bits. If there is no large-enough niche, then the discriminant is placed in a separate byte.

An outer enum can't use sometimes-valid payloads in an inner enum to represent its discriminant, if that's what you're trying to say. Multiple discriminants can be 'flattened' into a single continuous range of niche values, but they can't be 'flattened' into inner enums' payloads. That would cause a weird inversion where you need to read the inner discriminant just to know whether the outer discriminant is valid.

(The compiler does have a few tricks up its sleeve to make the most of niches. E.g., in a Result<(u8, bool), u8>, an Err(42) becomes [42, 2], but in a Result<(bool, u8), u8>, an Err(42) becomes [2, 42] (https://play.rust-lang.org/?version=stable&mode=debug&editio...). The 42 is repositioned to keep the niche intact.)