Most active commenters

uecker(5)
WalterBright(4)
renox(3)
layer8(3)
TZubiri(3)

Popular/hot comments

>>41852316 #
>>41852363 #
>>41852548 #
>>41852615 #

←back to thread

The C23 edition of Modern C

(gustedt.wordpress.com)

1. ralphc ◴[15 Oct 24 18:30 UTC] No.41851601[source]▶

>>41850017 (OP) #

How does "Modern" C compare safety-wise to Rust or Zig?

replies(4): >>41852048 #>>41852113 #>>41852498 #>>41856856 #

2. renox ◴[15 Oct 24 19:11 UTC] No.41852048[source]▶

>>41851601 (TP) #

You'd be surprised: Zig has one UB (Undefined Behaviour) that C doesn't have!

In release fast mode, unsigned overflow/underflow is undefined in Zig whereas in C it wraps.

:-)

Of course C has many UBs that Zig doesn't have, so C is far less safe than Zig, especially since you can use ReleaseSafe in Zig..

replies(2): >>41852363 #>>41852615 #

3. WalterBright ◴[15 Oct 24 19:18 UTC] No.41852113[source]▶

>>41851601 (TP) #

Modern C still promptly decays an array to a pointer, so no array bounds checking is possible.

D does not decay arrays, so D has array bounds checking.

Note that array overflow bugs are consistently the #1 problem with shipped C code, by a wide margin.

replies(2): >>41852316 #>>41857792 #

4. layer8 ◴[15 Oct 24 19:39 UTC] No.41852316[source]▶

>>41852113 #

> no array bounds checking is possible.

This isn’t strictly true, a C implementation is allowed to associate memory-range (or more generally, pointer provenance) metadata with a pointer.

The DeathStation 9000 features a conforming C implementation which is known to catch all array bounds violations. ;)

replies(4): >>41852348 #>>41852932 #>>41854734 #>>41855111 #

5. uecker ◴[15 Oct 24 19:42 UTC] No.41852348{3}[source]▶

>>41852316 #

Right. Also it might it sound like array-to-pointer decay is forced onto the programmer. Instead, you can take the address of an array just fine without letting it decay. The type then preserves the length.

replies(2): >>41853029 #>>41854211 #

6. uecker ◴[15 Oct 24 19:43 UTC] No.41852363[source]▶

>>41852048 #

UB is does not automatically make things unsafe. You can have a compiler that implements safe defaults for most UB, and then it is not unsafe.

replies(4): >>41852548 #>>41853004 #>>41853083 #>>41853762 #

7. jandrese ◴[15 Oct 24 19:58 UTC] No.41852498[source]▶

>>41851601 (TP) #

Modern C is barely any different than older C. The language committee for C is extremely conservative, changes tend to happen only around the edges.

replies(1): >>41857923 #

8. ahoka ◴[15 Oct 24 20:03 UTC] No.41852548{3}[source]▶

>>41852363 #

By definition UB cannot be safe.

replies(3): >>41853174 #>>41854910 #>>41858758 #

9. secondcoming ◴[15 Oct 24 20:11 UTC] No.41852615[source]▶

>>41852048 #

Does C automatically wrap? I thought you need to pass `-fwrapv` to the compiler to ensure that.

replies(3): >>41852833 #>>41852848 #>>41852877 #

10. greyw ◴[15 Oct 24 20:36 UTC] No.41852833{3}[source]▶

>>41852615 #

Unsigned overflow wraps. Signed overflow is undefined behavior.

replies(1): >>41852909 #

11. renox ◴[15 Oct 24 20:38 UTC] No.41852848{3}[source]▶

>>41852615 #

-fwrapv is for signed integer overflow not unsigned.

replies(1): >>41853085 #

12. ◴[15 Oct 24 20:41 UTC] No.41852877{3}[source]▶

>>41852615 #

13. kbolino ◴[15 Oct 24 20:45 UTC] No.41852909{4}[source]▶

>>41852833 #

This distinction does not exist in K&R 2/e which documents ANSI C aka C89, but maybe it was added in a later version of the language (or didn't make it into the book)? According to K&R, all overflow is undefined.

replies(1): >>41853245 #

14. TZubiri ◴[15 Oct 24 20:47 UTC] No.41852932{3}[source]▶

>>41852316 #

"The DeathStation 9000"

The what now?

replies(2): >>41853018 #>>41853918 #

15. duped ◴[15 Oct 24 20:54 UTC] No.41853004{3}[source]▶

>>41852363 #

That's implementation defined behavior, not undefined behavior. Undefined behavior explicitly refers to something the compiler does not provide a definition for, including "safe defaults."

replies(2): >>41853169 #>>41854616 #

16. layer8 ◴[15 Oct 24 20:56 UTC] No.41853018{4}[source]▶

>>41852932 #

Google it.

replies(1): >>41854150 #

17. codr7 ◴[15 Oct 24 20:57 UTC] No.41853029{4}[source]▶

>>41852348 #

Nice, when you know the length at compile time, which is rarely from my experience.

The holy grail is runtime access to the length, which means an array would have to be backed by something more elaborate.

replies(1): >>41856489 #

18. ◴[15 Oct 24 21:02 UTC] No.41853083{3}[source]▶

>>41852363 #

19. sp1rit ◴[15 Oct 24 21:02 UTC] No.41853085{4}[source]▶

>>41852848 #

Yes, as unsigned overflow is fine by default. AFAIK the issue was originally that there were still machines that used ones complement for describing negative integers instead of the now customary twos complement.

20. fuhsnn ◴[15 Oct 24 21:11 UTC] No.41853169{4}[source]▶

>>41853004 #

Compilers are not prohibited to provide their own definition for UB, that's how UBsan exists.

21. marssaxman ◴[15 Oct 24 21:11 UTC] No.41853174{4}[source]▶

>>41852548 #

this depends on your choice of definition for "safe"

22. wahern ◴[15 Oct 24 21:21 UTC] No.41853245{5}[source]▶

>>41852909 #

I don't have my copy of K&R handy, but this distinction has existed since the initial codification. From C89:

  3.1.2.5 Types

  [...] A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.

Source: C89 (draft) at https://port70.net/~nsz/c/c89/c89-draft.txt

23. renox ◴[15 Oct 24 22:24 UTC] No.41853762{3}[source]▶

>>41852363 #

Well Zig has ReleaseSafe for this.. ReleaseFast is for using these UBs to generate the fastest code.

24. bsder ◴[15 Oct 24 22:49 UTC] No.41853918{4}[source]▶

>>41852932 #

Nasal daemons for those of us of a slightly older vintage ...

25. TZubiri ◴[15 Oct 24 23:26 UTC] No.41854150{5}[source]▶

>>41853018 #

Yeah, why have any type of human interaction in a forum when you can just refer your fellow brethren to the automaton.

replies(1): >>41854271 #

26. WalterBright ◴[15 Oct 24 23:36 UTC] No.41854211{4}[source]▶

>>41852348 #

C: int foo(int a[]) { return a[5]; }

    int main() {
        int a[3];
        return foo(a);
    }

    > gcc test.c
    > ./a.out

Oops.

D: int foo(int[] a) { return a[5]; }

    int main() {
        int[3] a;
        return foo(a);
    }

    > ./cc array.d
    > ./array
    core.exception.ArrayIndexError@array.d(1): index [5] is out of bounds for array of length 3

Ah, Nirvana!

How to fix it for C:

https://www.digitalmars.com/articles/C-biggest-mistake.html

replies(2): >>41856518 #>>41859824 #

27. layer8 ◴[15 Oct 24 23:50 UTC] No.41854271{6}[source]▶

>>41854150 #

I’m saying this because any explanation I could offer would provide less insight than the Google results.

replies(1): >>41854957 #

28. Maxatar ◴[16 Oct 24 00:52 UTC] No.41854616{4}[source]▶

>>41853004 #

The C standard says, and I quote:

>Possible undefined behavior ranges from ignoring the situation completely with unpredictable results ... or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message)

So a compiler is absolutely welcome to make undefined behavior safe. In fact every compiler I know of, such as GCC, clang, MSVC has flags to make various undefined behavior safe, such as signed integer overflow, type punning, casting function pointers to void pointers.

The Linux kernel is notorious for leveraging undefined behavior in C for which GCC guarantees specific and well defined behavior.

It looks like there is also the notion of unspecified behavior, which gives compilers a choice about the behavior and does not require compilers to document that choice or even choose consistently.

And finally there is what you bring up, which is implementation defined behavior which is defined as a subset of unspecified behavior in which compilers must document the choice.

29. trealira ◴[16 Oct 24 01:20 UTC] No.41854734{3}[source]▶

>>41852316 #

> The DeathStation 9000 features a conforming C implementation which is known to catch all array bounds violations. ;)

That actually really does exist already with CHERI CPUs, whose pointers are tagged with "capabilities," which catch buffer overruns at runtime.

https://tratt.net/laurie/blog/2023/two_stories_for_what_is_c...

https://msrc.microsoft.com/blog/2022/01/an_armful_of_cheris/

30. Maxatar ◴[16 Oct 24 01:58 UTC] No.41854910{4}[source]▶

>>41852548 #

The definition given by the C standard allows for safe undefined behavior.

31. TZubiri ◴[16 Oct 24 02:10 UTC] No.41854957{7}[source]▶

>>41854271 #

Less insight, perhaps, but of higher quality, which is subjective.

I personally find that googling stuff provides not much connection to the subject of study, very impersonal and try to avoid it.

For example I did google the concept, and found this https://github.com/cousteaulecommandant/ds9k.

Which is not trivial to parse, bing posited the answer as authoritative, and if you look at the code it is really nothing, it seems to be a folklore concept, and as such, it is much more aptly transmitted by speaking to a human and getting a live version than by googling an authoratitative static answer.

32. Rusky ◴[16 Oct 24 02:44 UTC] No.41855111{3}[source]▶

>>41852316 #

A worked example: https://github.com/pizlonator/llvm-project-deluge/blob/delug...

33. uecker ◴[16 Oct 24 07:28 UTC] No.41856489{5}[source]▶

>>41853029 #

Oh, it also work for runtime length:

https://godbolt.org/z/PnaWWcK9o

replies(1): >>41856543 #

34. uecker ◴[16 Oct 24 07:33 UTC] No.41856518{5}[source]▶

>>41854211 #

You need to take the address of the array instead of letting it decay and then size is encoded in the type:

  int foo(int (*a)[6]) { return a[5]; }
  int main() {
  int a[3];
    return foo(&a);
  }

Or for run-time length:

  int foo(int n, int (*a)[n]) { return (\*a)[5]; }
  int main() {
    int a[3];
    return foo(ARRAY_SIZE(a), &a);
  }
  /app/example.c:4:38: runtime error: index 5 out of bounds for 
 type 'int[n]'

https://godbolt.org/z/dxx7TsKbK\*

replies(2): >>41862243 #>>41869100 #

35. pjmlp ◴[16 Oct 24 07:38 UTC] No.41856543{6}[source]▶

>>41856489 #

Now try that on a compiler without -fsanitize=bounds, yet full ISO C compliant.

replies(1): >>41857759 #

36. pornel ◴[16 Oct 24 08:32 UTC] No.41856856[source]▶

>>41851601 (TP) #

There's finally a way to safely add two signed numbers, without tricky overflow checks that may trigger UB themselves!

37. uecker ◴[16 Oct 24 11:10 UTC] No.41857759{7}[source]▶

>>41856543 #

You can still access the size which is what the parent was asking for. And please tell me how you would try this on an ISO compliant compiler for D.

replies(1): >>41859239 #

38. sdk77 ◴[16 Oct 24 11:15 UTC] No.41857792[source]▶

>>41852113 #

The thing is though that even with array bounds checking built into the language, out of bounds access due to programming error can still be attempted. Only this time it's safer because an attacker can't use the bug (which still exists) to access memory outside of bounds. In any case, the program still doesn't work as intended (has bugs) because the programmer has attempted, or allowed the attempt, to access out of bounds memory.

Writing safe code is better than depending on safety features. Writing safe code is possible in any programming language, the only things required are good design principles and discipline (i.e. solid engineering).

replies(1): >>41862256 #

39. flohofwoe ◴[16 Oct 24 11:36 UTC] No.41857923[source]▶

>>41852498 #

Except for C99 which added designated init and compound literals. With those it almost feels like a new language compared to C89 (and the C99 designated init feature is so well thought out that it still beats most similar initialization patterns in more recent languages, including C++, Rust and Zig - only Odin seems to "get it").

40. umanwizard ◴[16 Oct 24 13:21 UTC] No.41858758{4}[source]▶

>>41852548 #

Something can be UB according to the standard, but defined (and safe) according to a particular implementation. Lots of stuff is UB according to the C or C++ standard but does something sensible in gcc and/or clang.

41. pjmlp ◴[16 Oct 24 14:13 UTC] No.41859239{8}[source]▶

>>41857759 #

D has bounds checking, and isn't a ISO language.

42. ryao ◴[16 Oct 24 14:57 UTC] No.41859824{5}[source]▶

>>41854211 #

This should be caught by CHERI.

43. WalterBright ◴[16 Oct 24 18:35 UTC] No.41862243{6}[source]▶

>>41856518 #

  int foo(int n, int (*a)[n]) { return (\*a)[5]; }
  int main() {
    int a[3];
    return foo(ARRAY_SIZE(a), &a);
  }

That syntax is why array overflows remain the #1 problem with C bugs in shipped code. It isn't any better than:

  int foo(size_t n, int* a) { assert(5 < n); return a[5]; }
  int main() {
    int a[3];
    return foo(ARRAY_SIZE(a), a);
  }

as the array dimension has to be handled separately from the pointer.

Contrast with how simple it is in D:

    int foo(int[] a) { return a[5]; }
    int main() {
        int[3] a;
        return foo(a);
    }

and the proof is shown by array overflow bugs in the wild are stopped cold. It can be that simple and effective in C.

44. WalterBright ◴[16 Oct 24 18:36 UTC] No.41862256{3}[source]▶

>>41857792 #

In practice in C, that does not work because array overflow bugs are still the #1 bug in shipped C code, by a wide margin.

45. marcodiego ◴[17 Oct 24 12:40 UTC] No.41869100{6}[source]▶

>>41856518 #

\* what operator is this? I have never seen it. Where can I read about it?

replies(1): >>41871820 #

46. aw1621107 ◴[17 Oct 24 17:42 UTC] No.41871820{7}[source]▶

>>41869100 #

My guess is that it was intended to escape the * since unescaped * in regular text on HN results in italics. Since the text in question is in a code block, though, that escaping is not needed.

↑