Hacktical C: practical hacker's guide to the C programming language

(github.com)

218 points signa11 | 1 comments | 14 Apr 25 10:20 UTC | HN request time: 0s | source

Show context

9d ◴[14 Apr 25 13:49 UTC] No.43681256[source]▶

> C doesn't try to save you from making mistakes. It has very few opinions about your code and happily assumes that you know exactly what you're doing. Freedom with responsibility.

I love C because it doesn't make my life very inconvenient to protect me from stubbing my toe in it. I hate C when I stub my toe in it.

replies(5): >>43682578 #>>43683142 #>>43683157 #>>43683835 #>>43684772 #

oconnor663 ◴[14 Apr 25 18:54 UTC] No.43684772[source]▶

>>43681256 #

> It has very few opinions about your code

I understand where this is coming from, but I think this is less true than it used to be, and (for that reason) it often devolves into arguments about whether the C standard is the actual source of truth for what you're "really" allowed to do in C. For example, the standard says I must never:

- cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)

- allow a signed integer to overflow

- pass a NULL pointer to memcpy, even if the length is zero

- read an unitialized object, even if I "don't care" what value I get

- read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that

All of these are ways that (modern, standard) C doesn't really "do what the programmer said". A lot of big real-world projects build with flags like -fno-strict-aliasing, so that they can get away with doing these things even though the standard says they shouldn't. But then, are they really writing C or "C with custom extensions"? When we compare C to other languages, whose extensions are we talking about?

replies(1): >>43701472 #

ryao ◴[16 Apr 25 04:27 UTC] No.43701472[source]▶

>>43684772 #

  cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)

Use the union type. Abusing it for aliasing violates the standard too, but GCC and Clang implement an extension that permits this. Alternatively, just allocate a char array and cast it as you please. Strict aliasing does not apply to char arrays if I recall.

  allow a signed integer to overflow

Is this still true? I thought that the reason for this is because C left the implementation to define how signed arithmetic worked, meaning you could not assume two’s complement, but the most recent C standard was supposed to mandate two’s complement.

  pass a NULL pointer to memcpy, even if the length is zero

There is a reason for this. memcpy is allowed to start reading early as a performance optimization, before it does a branch that checks if reading is only. I do wonder what happens if you only want to copy 1 byte and that byte has invalid memory right next to it. Presumably, this optimization would read more than a byte.

  read an unitialized object, even if I "don't care" what value I get

You are probably doing something wrong if you do this. It is not even good as an entropy source.

  read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that

Earlier C standards likely did not say anything about this because they did not support multithreading, but outside of possibly reading/writing to hardware registers, you do not want to do this because of races. Even if you think you know better, you almost certainly do not.

replies(3): >>43701746 #>>43703321 #>>43704285 #

lifthrasiir ◴[16 Apr 25 05:17 UTC] No.43701746[source]▶

>>43701472 #

> the most recent C standard was supposed to mandate two’s complement.

While that's true, overflows are not automatically wrapping because they instead may trap for several reasons. (C++ does require wrapping now in comparison. [1])

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf

> memcpy is allowed to start reading early as a performance optimization, [...]

Most modern memcpy implementations would branch on the length anyway, because word-based copying is generally faster than byte-based copying whenever possible. Also many would try SIMD when the copy size exceeds some threshold for the same reason.

>> read an unitialized object, even if I "don't care" what value I get

> You are probably doing something wrong if you do this.

The GP meant the case like this. Consider `struct foo { bool avail; int value; } foos[100];` where `value` would be only set when `avail` is true. If we are summing all available `value`s, we may want to avoid a branch misprediction by something like `accum += foos[i].avail * foos[i].value;` for each `foos[i]`, since the actual `value` shouldn't matter when `avail` is false. But the current specification prohibits this construction because it considers that each read from `foos[i].value` may be different from each other (!). In reality, this kind of issues is so widespread that LLVM has a special "poison" value which gets resolved to some fixed value after the first use.

replies(1): >>43702937 #

1. ryao ◴[16 Apr 25 08:30 UTC] No.43702937[source]▶

>>43701746 #

Thanks for the explanations.

As for the last one, I would probably bzero() that structure, as it is faster than setting just 1 field to zero in a loop, which presumably is what you would do until you have some need to “allocate” a value. That would avoid the problem entirely.

I know bzero() was removed from POSIX, but “bzero()” is nicer to write than “memset() it to zero”.

↑