I love C because it doesn't make my life very inconvenient to protect me from stubbing my toe in it. I hate C when I stub my toe in it.
I love C because it doesn't make my life very inconvenient to protect me from stubbing my toe in it. I hate C when I stub my toe in it.
I understand where this is coming from, but I think this is less true than it used to be, and (for that reason) it often devolves into arguments about whether the C standard is the actual source of truth for what you're "really" allowed to do in C. For example, the standard says I must never:
- cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)
- allow a signed integer to overflow
- pass a NULL pointer to memcpy, even if the length is zero
- read an unitialized object, even if I "don't care" what value I get
- read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that
All of these are ways that (modern, standard) C doesn't really "do what the programmer said". A lot of big real-world projects build with flags like -fno-strict-aliasing, so that they can get away with doing these things even though the standard says they shouldn't. But then, are they really writing C or "C with custom extensions"? When we compare C to other languages, whose extensions are we talking about?
cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)
Use the union type. Abusing it for aliasing violates the standard too, but GCC and Clang implement an extension that permits this. Alternatively, just allocate a char array and cast it as you please. Strict aliasing does not apply to char arrays if I recall. allow a signed integer to overflow
Is this still true? I thought that the reason for this is because C left the implementation to define how signed arithmetic worked, meaning you could not assume two’s complement, but the most recent C standard was supposed to mandate two’s complement. pass a NULL pointer to memcpy, even if the length is zero
There is a reason for this. memcpy is allowed to start reading early as a performance optimization, before it does a branch that checks if reading is only. I do wonder what happens if you only want to copy 1 byte and that byte has invalid memory right next to it. Presumably, this optimization would read more than a byte. read an unitialized object, even if I "don't care" what value I get
You are probably doing something wrong if you do this. It is not even good as an entropy source. read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that
Earlier C standards likely did not say anything about this because they did not support multithreading, but outside of possibly reading/writing to hardware registers, you do not want to do this because of races. Even if you think you know better, you almost certainly do not.While that's true, overflows are not automatically wrapping because they instead may trap for several reasons. (C++ does require wrapping now in comparison. [1])
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
> memcpy is allowed to start reading early as a performance optimization, [...]
Most modern memcpy implementations would branch on the length anyway, because word-based copying is generally faster than byte-based copying whenever possible. Also many would try SIMD when the copy size exceeds some threshold for the same reason.
>> read an unitialized object, even if I "don't care" what value I get
> You are probably doing something wrong if you do this.
The GP meant the case like this. Consider `struct foo { bool avail; int value; } foos[100];` where `value` would be only set when `avail` is true. If we are summing all available `value`s, we may want to avoid a branch misprediction by something like `accum += foos[i].avail * foos[i].value;` for each `foos[i]`, since the actual `value` shouldn't matter when `avail` is false. But the current specification prohibits this construction because it considers that each read from `foos[i].value` may be different from each other (!). In reality, this kind of issues is so widespread that LLVM has a special "poison" value which gets resolved to some fixed value after the first use.
As for the last one, I would probably bzero() that structure, as it is faster than setting just 1 field to zero in a loop, which presumably is what you would do until you have some need to “allocate” a value. That would avoid the problem entirely.
I know bzero() was removed from POSIX, but “bzero()” is nicer to write than “memset() it to zero”.