Most active commenters
  • uecker(5)
  • WalterBright(4)
  • renox(3)
  • layer8(3)
  • TZubiri(3)

←back to thread

The C23 edition of Modern C

(gustedt.wordpress.com)
515 points bwidlar | 46 comments | | HN request time: 0.004s | source | bottom
1. ralphc ◴[] No.41851601[source]
How does "Modern" C compare safety-wise to Rust or Zig?
replies(4): >>41852048 #>>41852113 #>>41852498 #>>41856856 #
2. renox ◴[] No.41852048[source]
You'd be surprised: Zig has one UB (Undefined Behaviour) that C doesn't have!

In release fast mode, unsigned overflow/underflow is undefined in Zig whereas in C it wraps.

:-)

Of course C has many UBs that Zig doesn't have, so C is far less safe than Zig, especially since you can use ReleaseSafe in Zig..

replies(2): >>41852363 #>>41852615 #
3. WalterBright ◴[] No.41852113[source]
Modern C still promptly decays an array to a pointer, so no array bounds checking is possible.

D does not decay arrays, so D has array bounds checking.

Note that array overflow bugs are consistently the #1 problem with shipped C code, by a wide margin.

replies(2): >>41852316 #>>41857792 #
4. layer8 ◴[] No.41852316[source]
> no array bounds checking is possible.

This isn’t strictly true, a C implementation is allowed to associate memory-range (or more generally, pointer provenance) metadata with a pointer.

The DeathStation 9000 features a conforming C implementation which is known to catch all array bounds violations. ;)

replies(4): >>41852348 #>>41852932 #>>41854734 #>>41855111 #
5. uecker ◴[] No.41852348{3}[source]
Right. Also it might it sound like array-to-pointer decay is forced onto the programmer. Instead, you can take the address of an array just fine without letting it decay. The type then preserves the length.
replies(2): >>41853029 #>>41854211 #
6. uecker ◴[] No.41852363[source]
UB is does not automatically make things unsafe. You can have a compiler that implements safe defaults for most UB, and then it is not unsafe.
replies(4): >>41852548 #>>41853004 #>>41853083 #>>41853762 #
7. jandrese ◴[] No.41852498[source]
Modern C is barely any different than older C. The language committee for C is extremely conservative, changes tend to happen only around the edges.
replies(1): >>41857923 #
8. ahoka ◴[] No.41852548{3}[source]
By definition UB cannot be safe.
replies(3): >>41853174 #>>41854910 #>>41858758 #
9. secondcoming ◴[] No.41852615[source]
Does C automatically wrap? I thought you need to pass `-fwrapv` to the compiler to ensure that.
replies(3): >>41852833 #>>41852848 #>>41852877 #
10. greyw ◴[] No.41852833{3}[source]
Unsigned overflow wraps. Signed overflow is undefined behavior.
replies(1): >>41852909 #
11. renox ◴[] No.41852848{3}[source]
-fwrapv is for signed integer overflow not unsigned.
replies(1): >>41853085 #
12. ◴[] No.41852877{3}[source]
13. kbolino ◴[] No.41852909{4}[source]
This distinction does not exist in K&R 2/e which documents ANSI C aka C89, but maybe it was added in a later version of the language (or didn't make it into the book)? According to K&R, all overflow is undefined.
replies(1): >>41853245 #
14. TZubiri ◴[] No.41852932{3}[source]
"The DeathStation 9000"

The what now?

replies(2): >>41853018 #>>41853918 #
15. duped ◴[] No.41853004{3}[source]
That's implementation defined behavior, not undefined behavior. Undefined behavior explicitly refers to something the compiler does not provide a definition for, including "safe defaults."
replies(2): >>41853169 #>>41854616 #
16. layer8 ◴[] No.41853018{4}[source]
Google it.
replies(1): >>41854150 #
17. codr7 ◴[] No.41853029{4}[source]
Nice, when you know the length at compile time, which is rarely from my experience.

The holy grail is runtime access to the length, which means an array would have to be backed by something more elaborate.

replies(1): >>41856489 #
18. ◴[] No.41853083{3}[source]
19. sp1rit ◴[] No.41853085{4}[source]
Yes, as unsigned overflow is fine by default. AFAIK the issue was originally that there were still machines that used ones complement for describing negative integers instead of the now customary twos complement.
20. fuhsnn ◴[] No.41853169{4}[source]
Compilers are not prohibited to provide their own definition for UB, that's how UBsan exists.
21. marssaxman ◴[] No.41853174{4}[source]
this depends on your choice of definition for "safe"
22. wahern ◴[] No.41853245{5}[source]
I don't have my copy of K&R handy, but this distinction has existed since the initial codification. From C89:

  3.1.2.5 Types

  [...] A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
Source: C89 (draft) at https://port70.net/~nsz/c/c89/c89-draft.txt
23. renox ◴[] No.41853762{3}[source]
Well Zig has ReleaseSafe for this.. ReleaseFast is for using these UBs to generate the fastest code.
24. bsder ◴[] No.41853918{4}[source]
Nasal daemons for those of us of a slightly older vintage ...
25. TZubiri ◴[] No.41854150{5}[source]
Yeah, why have any type of human interaction in a forum when you can just refer your fellow brethren to the automaton.
replies(1): >>41854271 #
26. WalterBright ◴[] No.41854211{4}[source]
C: int foo(int a[]) { return a[5]; }

    int main() {
        int a[3];
        return foo(a);
    }

    > gcc test.c
    > ./a.out
Oops.

D: int foo(int[] a) { return a[5]; }

    int main() {
        int[3] a;
        return foo(a);
    }

    > ./cc array.d
    > ./array
    core.exception.ArrayIndexError@array.d(1): index [5] is out of bounds for array of length 3
Ah, Nirvana!

How to fix it for C:

https://www.digitalmars.com/articles/C-biggest-mistake.html

replies(2): >>41856518 #>>41859824 #
27. layer8 ◴[] No.41854271{6}[source]
I’m saying this because any explanation I could offer would provide less insight than the Google results.
replies(1): >>41854957 #
28. Maxatar ◴[] No.41854616{4}[source]
The C standard says, and I quote:

>Possible undefined behavior ranges from ignoring the situation completely with unpredictable results ... or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message)

So a compiler is absolutely welcome to make undefined behavior safe. In fact every compiler I know of, such as GCC, clang, MSVC has flags to make various undefined behavior safe, such as signed integer overflow, type punning, casting function pointers to void pointers.

The Linux kernel is notorious for leveraging undefined behavior in C for which GCC guarantees specific and well defined behavior.

It looks like there is also the notion of unspecified behavior, which gives compilers a choice about the behavior and does not require compilers to document that choice or even choose consistently.

And finally there is what you bring up, which is implementation defined behavior which is defined as a subset of unspecified behavior in which compilers must document the choice.

29. trealira ◴[] No.41854734{3}[source]
> The DeathStation 9000 features a conforming C implementation which is known to catch all array bounds violations. ;)

That actually really does exist already with CHERI CPUs, whose pointers are tagged with "capabilities," which catch buffer overruns at runtime.

https://tratt.net/laurie/blog/2023/two_stories_for_what_is_c...

https://msrc.microsoft.com/blog/2022/01/an_armful_of_cheris/

30. Maxatar ◴[] No.41854910{4}[source]
The definition given by the C standard allows for safe undefined behavior.
31. TZubiri ◴[] No.41854957{7}[source]
Less insight, perhaps, but of higher quality, which is subjective.

I personally find that googling stuff provides not much connection to the subject of study, very impersonal and try to avoid it.

For example I did google the concept, and found this https://github.com/cousteaulecommandant/ds9k.

Which is not trivial to parse, bing posited the answer as authoritative, and if you look at the code it is really nothing, it seems to be a folklore concept, and as such, it is much more aptly transmitted by speaking to a human and getting a live version than by googling an authoratitative static answer.

32. Rusky ◴[] No.41855111{3}[source]
A worked example: https://github.com/pizlonator/llvm-project-deluge/blob/delug...
33. uecker ◴[] No.41856489{5}[source]
Oh, it also work for runtime length:

https://godbolt.org/z/PnaWWcK9o

replies(1): >>41856543 #
34. uecker ◴[] No.41856518{5}[source]
You need to take the address of the array instead of letting it decay and then size is encoded in the type:

  int foo(int (*a)[6]) { return a[5]; }
  int main() {
  int a[3];
    return foo(&a);
  }
Or for run-time length:

  int foo(int n, int (*a)[n]) { return (\*a)[5]; }
  int main() {
    int a[3];
    return foo(ARRAY_SIZE(a), &a);
  }
  /app/example.c:4:38: runtime error: index 5 out of bounds for 
 type 'int[n]'
https://godbolt.org/z/dxx7TsKbK\*
replies(2): >>41862243 #>>41869100 #
35. pjmlp ◴[] No.41856543{6}[source]
Now try that on a compiler without -fsanitize=bounds, yet full ISO C compliant.
replies(1): >>41857759 #
36. pornel ◴[] No.41856856[source]
There's finally a way to safely add two signed numbers, without tricky overflow checks that may trigger UB themselves!
37. uecker ◴[] No.41857759{7}[source]
You can still access the size which is what the parent was asking for. And please tell me how you would try this on an ISO compliant compiler for D.
replies(1): >>41859239 #
38. sdk77 ◴[] No.41857792[source]
The thing is though that even with array bounds checking built into the language, out of bounds access due to programming error can still be attempted. Only this time it's safer because an attacker can't use the bug (which still exists) to access memory outside of bounds. In any case, the program still doesn't work as intended (has bugs) because the programmer has attempted, or allowed the attempt, to access out of bounds memory.

Writing safe code is better than depending on safety features. Writing safe code is possible in any programming language, the only things required are good design principles and discipline (i.e. solid engineering).

replies(1): >>41862256 #
39. flohofwoe ◴[] No.41857923[source]
Except for C99 which added designated init and compound literals. With those it almost feels like a new language compared to C89 (and the C99 designated init feature is so well thought out that it still beats most similar initialization patterns in more recent languages, including C++, Rust and Zig - only Odin seems to "get it").
40. umanwizard ◴[] No.41858758{4}[source]
Something can be UB according to the standard, but defined (and safe) according to a particular implementation. Lots of stuff is UB according to the C or C++ standard but does something sensible in gcc and/or clang.
41. pjmlp ◴[] No.41859239{8}[source]
D has bounds checking, and isn't a ISO language.
42. ryao ◴[] No.41859824{5}[source]
This should be caught by CHERI.
43. WalterBright ◴[] No.41862243{6}[source]

  int foo(int n, int (*a)[n]) { return (\*a)[5]; }
  int main() {
    int a[3];
    return foo(ARRAY_SIZE(a), &a);
  }
That syntax is why array overflows remain the #1 problem with C bugs in shipped code. It isn't any better than:

  int foo(size_t n, int* a) { assert(5 < n); return a[5]; }
  int main() {
    int a[3];
    return foo(ARRAY_SIZE(a), a);
  }
as the array dimension has to be handled separately from the pointer.

Contrast with how simple it is in D:

    int foo(int[] a) { return a[5]; }
    int main() {
        int[3] a;
        return foo(a);
    }
and the proof is shown by array overflow bugs in the wild are stopped cold. It can be that simple and effective in C.
44. WalterBright ◴[] No.41862256{3}[source]
In practice in C, that does not work because array overflow bugs are still the #1 bug in shipped C code, by a wide margin.
45. marcodiego ◴[] No.41869100{6}[source]
\* what operator is this? I have never seen it. Where can I read about it?
replies(1): >>41871820 #
46. aw1621107 ◴[] No.41871820{7}[source]
My guess is that it was intended to escape the * since unescaped * in regular text on HN results in italics. Since the text in question is in a code block, though, that escaping is not needed.