Most active commenters

cryptonector(10)
eqvinox(7)
tialaramex(6)
flohofwoe(5)
dnautics(5)
uecker(5)
OkayPhysicist(4)
account42(3)
jvanderbot(3)
Y_Y(3)

Popular/hot comments

>>44422693 #
>>44424882 #
>>44423210 #
>>44422335 #
>>44427669 #
>>44424492 #
>>44423371 #
>>44424740 #
>>44423128 #
>>44425020 #
>>44425869 #
>>44422394 #
>>44433373 #
>>44436348 #

The provenance memory model for C

(gustedt.wordpress.com)

1. zombot ◴[30 Jun 25 12:17 UTC] No.44422335[source]▶

Does C allow Unicode identifiers now, or is that pseudo code? The code snippets also contain `&`, so something definitely went wrong with the transcoding to HTML.

replies(4): >>44422382 #>>44422416 #>>44422634 #>>44424896 #

2. tialaramex ◴[30 Jun 25 12:20 UTC] No.44422367[source]▶

>>44421185 (OP) #

Presumably this was converted from markdown or similar and the conversion partly failed or the input was broken.

From the PVI section onward it seems to recover, but if the author sees this please fix and re-convert your post.

[Edited, nope, there are more errors further in the text, this needed proper proofreading before it was posted, I can somewhat struggle through because I already know this topic but if this was intended to introduce newcomers it's probably very confusing]

replies(1): >>44425759 #

3. qsort ◴[30 Jun 25 12:21 UTC] No.44422382[source]▶

>>44422335 #

Quoting cppreference:

An identifier is an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and Unicode characters specified using \u and \U escape notation(since C99), of class XID_Continue(since C23). A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode non-digit character(since C99)(until C23), or Unicode character of class XID_Start)(since C23)). Identifiers are case-sensitive (lowercase and uppercase letters are distinct). Every identifier must conform to Normalization Form C.(since C23)

In practice depends on the compiler.

replies(1): >>44422453 #

4. lioeters ◴[30 Jun 25 12:23 UTC] No.44422394[source]▶

>>44421185 (OP) #

Looks like a code block didn't get closed properly, before this phrase:

> the functions `recip` and `recip⁺` and not equivalent

Several paragraphs after this got swallowed by the code block.

Edit: Oh, I didn't realize the article is by the author of the book, Modern C. I've seen it recommended in many places.

> The C23 edition of Modern C is now available for free download from https://hal.inria.fr/hal-02383654

replies(3): >>44422457 #>>44422685 #>>44423321 #

5. gavinray ◴[30 Jun 25 12:25 UTC] No.44422412[source]▶

>>44421185 (OP) #

Also of interest to folks looking at this might be TySan, the recently-merged LLVM Type-Based Aliasing sanitizer:

https://clang.llvm.org/docs/TypeSanitizer.html

https://www.phoronix.com/news/LLVM-Merge-TySan-Type-Sanitize...

replies(2): >>44426467 #>>44430826 #

6. unwind ◴[30 Jun 25 12:26 UTC] No.44422416[source]▶

>>44422335 #

I can't even view the post, I just get some kind of content management system-like with the page as JSON or something, in pink-on-white. I'm super confused. :|

The answer to your question seems to (still) be "no".

7. dgrunwald ◴[30 Jun 25 12:30 UTC] No.44422453{3}[source]▶

>>44422382 #

But the source character set remains implementation-defined, so compilers do not have to directly support unicode names, only the escape notation.

Definitely a questionable choice to throw off readers with unicode weirdness in the very first code example.

replies(1): >>44422534 #

8. johnisgood ◴[30 Jun 25 12:30 UTC] No.44422457[source]▶

>>44422394 #

It is a great book. I prefer the second edition, not the latest one though with what I call "bloated C".

replies(1): >>44425842 #

9. qsort ◴[30 Jun 25 12:39 UTC] No.44422534{4}[source]▶

>>44422453 #

If it were up to me, anything outside the basic character set in a source file would be a syntax error, I'm simply reporting what the spec says.

replies(2): >>44422647 #>>44423260 #

10. pjmlp ◴[30 Jun 25 12:50 UTC] No.44422634[source]▶

>>44422335 #

Besides the sibling comment on C23, it does work fine on GCC.

https://godbolt.org/z/qKejzc1Kb

Whereas clang loudly complains,

https://godbolt.org/z/qWrccWzYW

11. ncruces ◴[30 Jun 25 12:51 UTC] No.44422647{5}[source]▶

>>44422534 #

I use unicode for math in comments, and think makes certain complicated formulas far more readable.

replies(1): >>44424358 #

12. shakabrah ◴[30 Jun 25 12:55 UTC] No.44422685[source]▶

>>44422394 #

It made immediate sense to me it was Jen once I saw the code samples given

13. jvanderbot ◴[30 Jun 25 12:55 UTC] No.44422693[source]▶

>>44421185 (OP) #

I love Rust, but I miss C. If C can be updated to make it generally socially acceptable for new projects, I'd happily go back for some decent subset of things I do. However, there's a lot of anxiety and even angst around using C in production code.

replies(6): >>44422779 #>>44423128 #>>44423371 #>>44423771 #>>44425323 #>>44433479 #

14. mikewarot ◴[30 Jun 25 13:02 UTC] No.44422779[source]▶

>>44422693 #

If you can stomach the occasional Begin and End, and a far less confusing pointer syntax, Pascal might be the language for you. Free Pascal has some great string handling, so you never have to worry about allocating and freeing them, and they can store gigabytes of text, even Unicode. ;-)

replies(2): >>44422784 #>>44422867 #

15. tgv ◴[30 Jun 25 13:02 UTC] No.44422784{3}[source]▶

>>44422779 #

Or try Ada.

16. jvanderbot ◴[30 Jun 25 13:11 UTC] No.44422867{3}[source]▶

>>44422779 #

If my fellow devs cringe at C, imagine their reaction to Pascal

replies(1): >>44423210 #

17. briandw ◴[30 Jun 25 13:31 UTC] No.44423073[source]▶

>>44421185 (OP) #

The code blocks are very difficult to read on this page. I had ChatGPT O3 rewrite this in a more accessible format. https://chatgpt.com/share/68629096-0624-8005-846f-7c0d655061...

replies(1): >>44424781 #

18. flohofwoe ◴[30 Jun 25 13:37 UTC] No.44423128[source]▶

>>44422693 #

> to make it generally socially acceptable for new projects...

Or better yet, don't let 'social pressure' influence your choice of programming language ;)

If your workplace has a clear rule to not use memory-unsafe languages for production code that's a different matter of course. But nothing can stop you from writing C code as a hobby - C99 and later is a very enjoyable and fun language.

replies(3): >>44423284 #>>44424118 #>>44424932 #

19. mikewarot ◴[30 Jun 25 13:45 UTC] No.44423210{4}[source]▶

>>44422867 #

C has all the things to hate in a programming language

  CaSe Sensitivity
  Weird pointer syntax
  Lack of a separate assignment token
  Null terminated strings
  Macros - the evil scourge of the universe

On the plus side, it's installed everywhere, and it's not indent sensitive

replies(5): >>44423261 #>>44424125 #>>44424253 #>>44424648 #>>44430684 #

20. guipsp ◴[30 Jun 25 13:50 UTC] No.44423260{5}[source]▶

>>44422534 #

What a "basic character set" is depends on locale

replies(2): >>44423859 #>>44424381 #

21. jvanderbot ◴[30 Jun 25 13:50 UTC] No.44423261{5}[source]▶

>>44423210 #

At this point, you're talking to someone who isn't here

22. xxs ◴[30 Jun 25 13:52 UTC] No.44423284{3}[source]▶

>>44423128 #

I was about the reply no amount of pressure can tell me how to program. C was totally fine for esp32

23. zmodem ◴[30 Jun 25 13:55 UTC] No.44423321[source]▶

>>44422394 #

> Looks like a code block didn't get closed properly

This seems to have been fixed now.

replies(1): >>44426141 #

24. bnferguson ◴[30 Jun 25 13:59 UTC] No.44423371[source]▶

>>44422693 #

Feels like Zig is starting to fill that role in some ways. Fewer sharp edges and a bit more safety than C, more modern approach, and even interops really well with C (even being possible to mix the two). Know a couple Rust devs that have said it seems to scratch that C itch while being more modern.

Of course it's still really nice to just have C itself being updated into something that's nicer to work with and easier to write safely, but Zig seems to be a decent other option.

replies(3): >>44423806 #>>44424327 #>>44425774 #

25. modeless ◴[30 Jun 25 14:28 UTC] No.44423771[source]▶

>>44422693 #

Fil-C is a modified version of Clang that makes C and C++ memory safe. It supports things you wouldn't expect to work like signal handling or setjmp/longjmp. It can compile real C projects like SQLite and OpenSSL with minimal to no changes, today. https://github.com/pizlonator/llvm-project-deluge/blob/delug...

replies(1): >>44427476 #

26. pjmlp ◴[30 Jun 25 14:30 UTC] No.44423806{3}[source]▶

>>44423371 #

As usual the remark that much of the Zig's safety over C, has been present since the late 1970's in languages like Modula-2, Object Pascal and Ada, but sadly they didn't born with curly brackets, nor brought a free OS to the uni party.

27. qsort ◴[30 Jun 25 14:34 UTC] No.44423859{6}[source]▶

>>44423260 #

https://en.cppreference.com/w/c/language/charset.html

28. TimorousBestie ◴[30 Jun 25 14:54 UTC] No.44424118{3}[source]▶

>>44423128 #

> Or better yet, don't let 'social pressure' influence your choice of programming language ;)

It’s hard. Programming is a social discipline, and the more people who work in a language, the more love it gets.

replies(1): >>44425134 #

29. ioasuncvinvaer ◴[30 Jun 25 14:55 UTC] No.44424125{5}[source]▶

>>44423210 #

Except for null terminated strings these don't seem like mayor issues to me. Can you elaborate?

30. b0a04gl ◴[30 Jun 25 15:02 UTC] No.44424206[source]▶

>>44421185 (OP) #

provenance model basically turns memory back into a typed value. finally malloc wont just be a dumb number generator, it'll act more like a capability issuer. and access is not 'is this address in range' anymore, but “does this pointer have valid provenance”. way more deterministic, decouples gcc -wall

replies(1): >>44424492 #

31. 1718627440 ◴[30 Jun 25 15:06 UTC] No.44424253{5}[source]▶

>>44423210 #

> Lack of a separate assignment token

What does that mean?

replies(1): >>44424637 #

32. dnautics ◴[30 Jun 25 15:12 UTC] No.44424327{3}[source]▶

>>44423371 #

(self-promotion) in principle one should be able to implement a fairly mature pointer provenance checker for zig, without changing the language. A basic proof of concept (don't use this, branches and loops have not been implemented yet):

https://www.youtube.com/watch?v=ZY_Z-aGbYm8

33. kzrdude ◴[30 Jun 25 15:14 UTC] No.44424358{6}[source]▶

>>44422647 #

I've just been learning pinyin notation, so now i think the variable řₚ should have a value that first goes down a bit and then up.

replies(1): >>44424558 #

34. account42 ◴[30 Jun 25 15:16 UTC] No.44424381{6}[source]▶

>>44423260 #

Anything except US-ASCII in source code outside comments and string constants should be a syntax error.

replies(1): >>44424740 #

35. HexDecOctBin ◴[30 Jun 25 15:28 UTC] No.44424492[source]▶

>>44424206 #

Will this create more nasal demons? I always disable strict aliasing, and it's not clear to me after reading the whole article whether provenance is about making sane code illegal, or making previously illegal sane code legal.

replies(3): >>44424935 #>>44425068 #>>44425399 #

36. zelphirkalt ◴[30 Jun 25 15:33 UTC] No.44424558{7}[source]▶

>>44424358 #

I am not sure it is a good idea to mix such specific phonetic script ideas about diacritic marks with the behavior of the program over time. Even considering the shape, it does not align with the idea of first down a little, then up a lot.

replies(1): >>44432063 #

37. kbolino ◴[30 Jun 25 15:41 UTC] No.44424637{6}[source]▶

>>44424253 #

Assignment is = which is too close to equality == and thus has been the source of bugs in the past, especially since C treats assignment as an expression and coerces lots of non-boolean values to true/false wherever a condition is expected (if, while, for). Most compilers warn about this at least nowadays.

replies(1): >>44427573 #

38. zelphirkalt ◴[30 Jun 25 15:42 UTC] No.44424648{5}[source]▶

>>44423210 #

You mean "mere string replacement macros, instead of hygienic macros", of course : )

39. guipsp ◴[30 Jun 25 15:51 UTC] No.44424740{7}[source]▶

>>44424381 #

You are aware other languages exist? Some of which don't even use the Latin script?

replies(3): >>44424911 #>>44426840 #>>44431566 #

40. cenobyte ◴[30 Jun 25 15:56 UTC] No.44424781[source]▶

>>44423073 #

So much better. Thank you!

41. ◴[30 Jun 25 15:57 UTC] No.44424796[source]▶

>>44421185 (OP) #

42. cenobyte ◴[30 Jun 25 15:59 UTC] No.44424820[source]▶

>>44421185 (OP) #

Please fix the code in your post.

43. eqvinox ◴[30 Jun 25 16:03 UTC] No.44424859[source]▶

>>44421185 (OP) #

Using the "register" storage class feels really alien for C code written in 2025…

replies(1): >>44428659 #

44. smcameron ◴[30 Jun 25 16:04 UTC] No.44424882[source]▶

>>44421185 (OP) #

Ugh. Are unicode variable names allowed in C now? That's horrific.

replies(5): >>44424985 #>>44425020 #>>44425869 #>>44426140 #>>44426336 #

45. Y_Y ◴[30 Jun 25 16:06 UTC] No.44424896[source]▶

>>44422335 #

Implementation-defined until C99, explicitly possible via UCNs aince c99, possible with explicit encoding since C23, but literals are still implementation defined.

46. Y_Y ◴[30 Jun 25 16:07 UTC] No.44424911{8}[source]▶

>>44424740 #

What; like APL‽

47. Y_Y ◴[30 Jun 25 16:08 UTC] No.44424932{3}[source]▶

>>44423128 #

I don't want to summon WB, but honest-to-god, D is a good middle ground here.

48. jcranmer ◴[30 Jun 25 16:09 UTC] No.44424935{3}[source]▶

>>44424492 #

All C compilers have some notion of pointer provenance embedded in them, and this is true going back decades.

The problem is that the documented definitions of pointer provenance (which generally amount to "you must somehow have a data dependency from the original object definition (e.g., malloc)") aren't really upheld by the optimizer, and the effective definition of the optimizer is generally internally inconsistent because people don't think about side effects of pointer-to-integer conversion. The one-past-the-end pointer being equal (but of different provenance) to a different object is a particular vexatious case.

The definition given in TS6010 is generally the closest you'll get to a formal description of the behavior that optimizers are already generally following, except for cases that are clearly agreed to be bugs. The biggest problem is that it makes pointer-to-int an operation with side effects that need to be preserved, and compilers today generally fail to preserve those side effects (especially when pointer-to-int conversion happens more as an implicit operation).

The practical effect of provenance--that you can't magic a pointer to an object out of thin air--has always been true. This is largely trying to clarify what it means to actually magic a pointer out of thin air; it's not a perfect answer, but it's the best answer anyone's come up with to date.

49. mananaysiempre ◴[30 Jun 25 16:14 UTC] No.44424985[source]▶

>>44424882 #

“Now” as in since C99, twenty-five years ago, yes. (It seemed like a good idea at the time.)

replies(2): >>44425080 #>>44425758 #

50. 1over137 ◴[30 Jun 25 16:17 UTC] No.44425020[source]▶

>>44424882 #

Horrific? You might not think so if your (human) language used a different alphabet.

replies(3): >>44425223 #>>44425239 #>>44425432 #

51. layer8 ◴[30 Jun 25 16:21 UTC] No.44425068{3}[source]▶

>>44424492 #

This is basically a formalization of the general understanding one already had when reading the C standard thoroughly 25 years ago. At least I was nodding along throughout the article. It cleans up the parts where the standard was too imprecise and handwavy.

52. 90s_dev ◴[30 Jun 25 16:22 UTC] No.44425080{3}[source]▶

>>44424985 #

53. spauldo ◴[30 Jun 25 16:29 UTC] No.44425134{4}[source]▶

>>44424118 #

If you're on UNIX or working in the embedded space, C is still everywhere and gets lots of love. C tends to get lots of libraries anyway because everything can FFI to it.

54. ajross ◴[30 Jun 25 16:36 UTC] No.44425223{3}[source]▶

>>44425020 #

Little to no source code is written for single (human) language development teams. Sure, everyone would like the ability to write source code in their native language. That's natural.

Literally no one, anywhere, wants to be forced to read source written in a language they can't read (or more specifically in this case: written in glyphs they can't even produce on their keyboard). That idea, for almost everyone, seems "horrific", yeah.

So a lingua franca is a firm requirement for modern software development outside of extremely specific environments (FSB malware authors probably don't care about anyone else reading their cyrillic variable names, etc...). Must it be ASCII-encoded English? No. But that's what the market has picked and most people seem happy enough with it.

replies(1): >>44425925 #

55. eqvinox ◴[30 Jun 25 16:37 UTC] No.44425239{3}[source]▶

>>44425020 #

Yes but also no. The thing about software is that 90% of it is not culturally bound. If you're writing, say, some tax reporting tool, a grammar reference, or something religious… sure, it makes sense to write that in your language. So, yeah, C should support that.

However, everything else, from spreadsheet software to CAD tools to OS kernels to JavaScript frameworks is universal across cultures and languages. And for better or for worse (I'm not a native English speaker either), the world has gone with English for a lot of code commons.

And the thing with the examples in that post isn't about supporting language diversity, it's math symbols which are noone's native language. And you pretty much can't type them on any keyboard. Which really makes it a rather poor flex IMHO. Did the author reconfigure their keyboard layout for that specific math use case? It can't generically cover "all of math" either. Or did they copy&paste it around? That's just silly.

[…could some of the downvoters explain why they're downvoting?]

replies(2): >>44426108 #>>44435421 #

56. uecker ◴[30 Jun 25 16:45 UTC] No.44425323[source]▶

>>44422693 #

Do you really love Rust, or do you feel pressured to say so?

replies(2): >>44425848 #>>44428768 #

57. Diggsey ◴[30 Jun 25 16:50 UTC] No.44425399{3}[source]▶

>>44424492 #

It's standardizing the contract between the programmer and the compiler.

Previously a lot of C code was non-portable because it relied on behaviour that wasn't defined as part of the standard. If you compiled it with the wrong compiler or the wrong flags you might get miscompilations.

The provenance memory model draws a line in the sand and says "all C code on this side of the line should behave in this well defined way". Any optimizations implemented by compiler authors which would miscompile code on that side of the line would need to be disabled.

Assuming the authors of the model have done a good job, the impact on compiler optimizations should be minimized whilst making as much existing C code fall on the "right" side of the line as possible.

For new C code it provides programmers a way to write useful code that is also portable, since we now have a line that we can all hopefully agree on.

58. Joker_vD ◴[30 Jun 25 16:53 UTC] No.44425432{3}[source]▶

>>44425020 #

My language uses Cyrillic and I personally prefer English-based keywords and variable names precisely because they are not words of my (human) language. It introduces an easy and obvious distinction between the machine-oriented and the human-oriented.

replies(2): >>44426725 #>>44435410 #

59. Joker_vD ◴[30 Jun 25 17:01 UTC] No.44425547[source]▶

>>44421185 (OP) #

> Here the term "same representation and alignment" covers for example the possibility to look at [...] one would be a structure and the other would be another structure that sits at the beginning of the first.

Does it? It is quite simple for a struct A that has struct B as its first member to have radically different alignment:

    struct B { char x; };

    struct A { struct B b; long long y; };

Also, accidentally coinciding pointers are nothing "rare" because all objects are allowed to be treated as 1-element arrays: so any pointer to an e.g. struct field is also a pointer one-past the previous field of this struct; also, malloc() allocations easily may produce "touching" objects. So thanks for allowing implementations to not have padding between almost every two objects, I guess.

replies(1): >>44426264 #

60. kevincox ◴[30 Jun 25 17:20 UTC] No.44425758{3}[source]▶

>>44424985 #

Being able to program in languages that don't fit into ASCII is a good idea. Using one-character variable names is a bad idea.

replies(2): >>44425908 #>>44428235 #

61. gustedt ◴[30 Jun 25 17:20 UTC] No.44425759[source]▶

>>44422367 #

The problem is that wordpress changes these things once you edit in some part. I will probably regenerate the whole.

62. purplesyringa ◴[30 Jun 25 17:22 UTC] No.44425774{3}[source]▶

>>44423371 #

How close are Zig's safety guarantees to Rust's? Honest question; I don't follow Zig development. I can't take C seriously because it hasn't even bothered to define provenance until now, but as far as I'm aware, Zig doesn't even try to touch these topics.

Does Zig document the precise mechanics of noalias? Does it provide a mechanism for controllably exposing or not exposing provenance of a pointer? Does it specify the provenance ABA problem in atomics on compare-exchange somehow or is that undefined? Are there any plans to make allocation optimizations sound? (This is still a problem even in Rust land; you can write a program that is guaranteed to exhibit OOM according to the language spec, but LLVM outputs code that doesn't OOM.) Does it at least have a sanitizer like Miri to make sure UB (e.g. data races, type confusion, or aliasing problems) is absent?

If the answer to most of the above is "Zig doesn't care", why do people even consider it better than C?

replies(1): >>44426194 #

63. laqq3 ◴[30 Jun 25 17:29 UTC] No.44425842{3}[source]▶

>>44422457 #

I'm wondering if you could elaborate? I'd be curious to hear more about "bloated C" and the differences between the 2nd and 3rd edition.

replies(1): >>44442670 #

64. grg0 ◴[30 Jun 25 17:29 UTC] No.44425848{3}[source]▶

>>44425323 #

He grew up in a very stringent household. Everybody was writing Rust and he was like, "damn, I wish I could write C."

65. OkayPhysicist ◴[30 Jun 25 17:31 UTC] No.44425869[source]▶

>>44424882 #

Why shouldn't they be? It's not the 00's anymore, Unicode support is universal. You'd have to dust off some truly ancient tech to find something incapable of rendering it.

Source code is for humans, and thus should be written in whatever way makes it easiest to read, write, and understand for humans. If your language doesn't map onto ASCII, then Unicode support improves that goal. If your code is meant to directly implement some physics formula, then using the appropriate unicode characters might make it easier to read (and thus spot transcription errors, something I find far too often in physics simulations).

replies(3): >>44425932 #>>44426021 #>>44426146 #

66. adrianN ◴[30 Jun 25 17:35 UTC] No.44425908{4}[source]▶

>>44425758 #

Using variable names that are different but render (almost) the same can be a bad idea.

67. OkayPhysicist ◴[30 Jun 25 17:37 UTC] No.44425925{4}[source]▶

>>44425223 #

> Little to no source code is written for single (human) language development teams.

This is blatantly false. I'd posit that a solid 90% of all source code written is done so by single, co-located teams (a substantial portion of which are teams of 1). That certainly fits the bill for most companies I've worked at.

68. wheybags ◴[30 Jun 25 17:38 UTC] No.44425932{3}[source]▶

>>44425869 #

Hot take, but I've always felt the world would be better served if mathematicians and physicists would stop using terrible short variable names and use longCamelCaseDescriptiveNames like the rest of us, because paper is cheap, and abbreviations are confusing. I know it's nicer when you're writing by hand, but when you clean up a proof or formula for publishing, would it really be so hard to switch to descriptive names?

I'm a practitioner of neither though, so I can't condemn the practice wholeheartedly as an outsider, but it does make me groan.

replies(2): >>44426036 #>>44426288 #

69. someplaceguy ◴[30 Jun 25 17:46 UTC] No.44426021{3}[source]▶

>>44425869 #

> using the appropriate unicode characters might make it easier to read

It's probably also a great way to introduce almost undetectable security vulnerabilities by using Unicode characters that look similar to each other but in fact are different.

replies(1): >>44426189 #

70. senbrow ◴[30 Jun 25 17:48 UTC] No.44426036{4}[source]▶

>>44425932 #

Long names are good for short expressions, but they obfuscate complex ones because the identifiers visually crowd out the operators.

This can be especially difficult if the author is trying to map 1:1 to a complex algorithm in a white paper that uses domain-standard mathematical notation.

The alternative is to break the "full formula" into simpler expression chunks, but then naming those partial expression results descriptively can be even more challenging.

71. OkayPhysicist ◴[30 Jun 25 17:56 UTC] No.44426108{4}[source]▶

>>44425239 #

When I was doing a lot of Physics simulation in Julia, I had a Vim extension which would just allow me to type something like \gamma, hit tab, and get γ. This was worth the (minimal) hassle, because it made it very easy to spot check formulas. When you're shuffling data around in a loosely-described space like most of web dev, descriptive function and variable names are important because the description of what you're doing and what you're doing it too is the important information, and the actual operations you're taking are typically approximately trivial.

In heavily mathematical contexts, most of those assumptions get turned on their head. Anybody qualified to be modifying a model of electromagnetism is going to be intimately familiar with the language of the formulas: mu for permeability, epsilon for permittivity, etc. With that shared context,

1/(4*π*ε)*(q_electron * q_proton)/r^2 is going to be a lot easier to see, at a glance, as Coulombs law

compared to

1 / (4 * Math.Pi * permitivity_of_free_space)*(charge_electron * charge_proton)/distance_of_separation

Source code, like any other language built for humans, is meant to be read by humans. If those humans have a shared context, utilizing that shared context improves the quality and ease of that communication.

replies(1): >>44426271 #

72. loeg ◴[30 Jun 25 18:00 UTC] No.44426140[source]▶

>>44424882 #

Math people shouldn't be allowed to write code. It's not the unicode, so much as the extremely terse variable names.

replies(1): >>44426182 #

73. perching_aix ◴[30 Jun 25 18:00 UTC] No.44426141{3}[source]▶

>>44423321 #

I still see it, even after clearing caches, visiting from a separate browser from a separate computer (even a separate network).

74. bigstrat2003 ◴[30 Jun 25 18:01 UTC] No.44426146{3}[source]▶

>>44425869 #

They shouldn't be precisely because it makes the code harder to read and write when you include non-ASCII characters.

75. perching_aix ◴[30 Jun 25 18:04 UTC] No.44426182{3}[source]▶

>>44426140 #

Isn't that basically all C/C++ code? Admittedly I don't have much exposure to it, but it's pretty much a trope in and of itself, along with Java and C# suffering from the opposite problem.

Such a silly issue too, you'd think we'd have come up with some automated wrangling for this, so that those experienced with a codebase can switch over and see super short versions of identifiers, while people new to it all will see the long stuff.

replies(1): >>44428726 #

76. OkayPhysicist ◴[30 Jun 25 18:05 UTC] No.44426189{4}[source]▶

>>44426021 #

This would cause your compilation to fail, unless you were deliberately declaring and using near identical symbols. Which would violate the whole "Code is meant to be easily read by humans" thing.

replies(1): >>44426222 #

77. dnautics ◴[30 Jun 25 18:06 UTC] No.44426194{4}[source]▶

>>44425774 #

safety-wise, zig is better than C because if you don't do "easily flaggable things"[0] it doesn't have buffer overruns (including protection in the case of sentinel strings), or null pointer exceptions. Where this lies on the spectrum of "C to Rust" is a matter of judgement, but if I'm not mistaken it is easily a majority of memory-safety related CVEs. There's also no UB in debug, test, or release-safe. Note: you can opt-out of release-safe on a function-by-function basis. IIUC noalias is safety checked in debug, test, and release-safe.

In a sibling comment, I mentioned a proof of concept I did that if I had the time to complete/do correctly, it should give you near-rust-level checking on memory safety, plus automatically flags sites where you need to inspect the code. At the point where you are using MIRI, you're already bringing extra stuff into rust, so in practice zig + zig-clr could be the equivalent of the result of "what if you moved borrow checking from rustc into miri"

[0] type erasure, or using "known dangerous types, like c pointers, or non-slice multipointers".

replies(1): >>44427336 #

78. someplaceguy ◴[30 Jun 25 18:09 UTC] No.44426222{5}[source]▶

>>44426189 #

> unless you were deliberately declaring and using near identical symbols.

Yes, that would probably be one way to do it.

> Which would violate the whole "Code is meant to be easily read by humans" thing.

I'd think someone who's deliberately and sneakily introducing a security vulnerability would want it to be undetectable, rather than easily readable.

79. layer8 ◴[30 Jun 25 18:15 UTC] No.44426264[source]▶

>>44425547 #

This is about the representation and alignment of the pointer object, not about the object being pointed to. And C requires struct pointer types to all have the same representation and alignment. This is generally necessary due to the possibility of having pointers to opaque struct declarations in a translation unit.

Regarding your second point, if I understand the model correctly, there is only an ambiguity in pointer provenance if the adjacent objects are independent "storage instances", i.e. separately malloc'ed objects or separate variables on the stack — not between fields of the same struct.

80. eqvinox ◴[30 Jun 25 18:15 UTC] No.44426271{5}[source]▶

>>44426108 #

Hrm. Fair point. But will the other humans, even if they have the shared context, also have the ability to type in these symbols, if they want to edit the code? They probably don't have your vim extension…

I guess maybe this is an argument for better UI/UX for symbolic input…

81. nsingh2 ◴[30 Jun 25 18:18 UTC] No.44426288{4}[source]▶

>>44425932 #

Better served to students and those unfamiliar with the field, but noisy to those familiar. Considering that much of mathematical work is done using pen/paper, it would be a total pain to write out huge variable names every time.

Consider a simple programming example, in C blocks are delimited by `{}`, why not use `block_begin` and `block_end`? Because it's noisy, and it doesn't take much to internalize the meaning of braces.

82. SV_BubbleTime ◴[30 Jun 25 18:23 UTC] No.44426336[source]▶

>>44424882 #

> void recip(double* aₚ, double* řₚ) > { > for (;;) > { > register double Π = (aₚ)(řₚ);

My first thought before I saw this was “I wonder is this going to be an article from people who build things or something from “academics” that don’t.”

At least it was answered quickly.

83. dsp_person ◴[30 Jun 25 18:26 UTC] No.44426362[source]▶

>>44421185 (OP) #

    if ((Π⁻ &lt; Π) &amp;&amp; (Π &lt; Π⁺)) {

I spent way too long trying to figure this out as C code instead of

    if ((Π⁻ < Π) && (Π < Π⁺)) {

84. aengelke ◴[30 Jun 25 18:38 UTC] No.44426467[source]▶

>>44422412 #

It's probably worth noting that TySan currently only catches aliasing violations that LLVM would be able to exploit. For some types, e.g. unions, Clang doesn't emit accurate type-based aliasing information and therefore TySan won't catch these.

replies(1): >>44428607 #

85. ZoomZoomZoom ◴[30 Jun 25 19:03 UTC] No.44426725{4}[source]▶

>>44425432 #

I know what you mean and I shudder when I see code that uses words from my native lang, but most code is human-oriented.

86. nottorp ◴[30 Jun 25 19:16 UTC] No.44426840{8}[source]▶

>>44424740 #

Dunno about the OP but I'm very aware as I'm not an english speaker.

I still don't want anything as unpredictable as Unicode in my code. How many different encodings will display as the same variable name and how is the compiler supposed to decide?

If you're thinking of comments and user facing strings, the OP already excluded those.

replies(1): >>44435280 #

87. gustedt ◴[30 Jun 25 19:21 UTC] No.44426876[source]▶

>>44421185 (OP) #

Randomly introduced translation errors from markdown to wordpress-internal should be fixed, now. Sorry for the incovenience!

replies(1): >>44430717 #

88. tialaramex ◴[30 Jun 25 20:12 UTC] No.44427336{5}[source]▶

>>44426194 #

This is very much a "Draw the rest of the fucking owl" approach to safety.

replies(1): >>44427912 #

89. tialaramex ◴[30 Jun 25 20:26 UTC] No.44427476{3}[source]▶

>>44423771 #

Fil-C does seem like a quicker route if your existing idea was something like "rewrite it in Java" and it exists today whereas both C and C++ have only vague ambitions to deliver some future language which might meet your needs.

I will be very surprised if there's widespread adoption of Fil-C for many new projects though.

replies(1): >>44430422 #

90. tialaramex ◴[30 Jun 25 20:35 UTC] No.44427573{7}[source]▶

>>44424637 #

Even with warnings this is just terrible. People need to stop inventing languages where "False" is true, or an empty container is false or other insane "coercions" of this kind.

True is true, and false is false, if you're wondering whether this Doodad is Wibbly, you should ask that question not rely on a convention that Wibbly Doodads are somehow "truthy" while the non-Wibbly ones are not.

91. nikic ◴[30 Jun 25 20:45 UTC] No.44427669[source]▶

>>44421185 (OP) #

At least at a skim, what this specifies for exposure/synthesis for reads/writes of the object representation is concerning. One of the consequences is that dead integer loads cannot be eliminated, as they may have an exposure side effect. I guess C might be able to get away with it due to the interaction with strict aliasing rules. Still quite surprised that they are going against consensus here (and reduces the likelihood that these semantics will get adopted by implementers).

replies(4): >>44427836 #>>44428359 #>>44428989 #>>44432092 #

92. uecker ◴[30 Jun 25 21:01 UTC] No.44427836[source]▶

>>44427669 #

(Never mind, I misread you comment at first.) Yes, the representation access needs to be discussed... I took a couple of years to publish this document. More important would be if the ptr2int exposure could be implemented.

93. hinkley ◴[30 Jun 25 21:04 UTC] No.44427855[source]▶

>>44421185 (OP) #

> Unfortunately no C compiler can do this optimization automatically:

> The functions recip and recip⁺ and not equivalent.

This is one of those examples of how optimizing code can improve legibility, robustness, or both.

The first implementation allows for side effects to change the outcome of the function. But the problem is that the code is not written expecting someone to modify the values in the middle of the loop. It's incorrect behavior, and you're paying a performance penalty for it to boot.

Functional Core code tends not to have this problem, in that we pass in a snapshot of data and it either gets an answer or an error.

I've seen too much code that checks 3 times if a user is either still logged in or has permission to do a task, and not one of them was set up to deal with one answer for the first call and a different one for any of the subsequent ones. They just go into undefined behavior.

94. dnautics ◴[30 Jun 25 21:09 UTC] No.44427912{6}[source]▶

>>44427336 #

what percentage of CVEs are null pointer problems or buffer overflows? That's what percentage of the owl has been drawn. If someone (or me) builds out a proper zig-clr, then we get to, what? 90%. Great. Probably good enough, that's not far off from where rust is.

replies(1): >>44428393 #

95. jaisio ◴[30 Jun 25 21:39 UTC] No.44428200[source]▶

>>44421185 (OP) #

The root cause of all this is that C programs are not much more than glorified assembly programs. Any effort to retrofit higher level reasoning will always be defeated somebody doing some dirty pointer tricks. This can only be solved by more abstract ways to express programs which necessarily restricts the bare metal dirty things one can do. But what you gain is that the compiler will easily be able to do lots of things which a C compiler can't do or only with a lot of headache. The kind of stuff this article is about is really trying to solve the wrong problem IMO.

96. RossBencina ◴[30 Jun 25 21:44 UTC] No.44428235{4}[source]▶

>>44425758 #

Mathematics is a language that doesn't fit into ASCII and commonly uses one-character variable names. If you are implementing a documented mathematical algorithm (i.e. one with a description in a paper or book) then sticking to the notation of the paper (i.e. using one character variable names) makes sense to me.

replies(2): >>44428392 #>>44428481 #

97. comex ◴[30 Jun 25 21:59 UTC] No.44428359[source]▶

>>44427669 #

> I guess C might be able to get away with it due to the interaction with strict aliasing rules.

But not for char-typed accesses. And even for larger types, I think you would have to worry about the combo of first memcpying from pointer-typed memory to integer-typed memory, then loading the integer. If you eliminate dead integer loads, then you would have to not eliminate the memcpy.

replies(1): >>44437436 #

98. RossBencina ◴[30 Jun 25 22:02 UTC] No.44428380[source]▶

>>44421185 (OP) #

After reading the fine article I'm left wondering what if you implement your own heterogeneous allocation scheme on top of malloc? (e.g. TLSF) In this case all of your objects will belong to the same malloced storage region, and you will compute object offsets using raw pointers, but I'd expect provenance to potentially treat each returned object to behave as if it were allocated from a separate disjoint storage.

I guess my question is: does this provenance model allow for recursive nesting of allocators with a separate notion of "storage" at each level?

replies(1): >>44428541 #

99. mananaysiempre ◴[30 Jun 25 22:03 UTC] No.44428392{5}[source]▶

>>44428235 #

Unfortunately, many of the things of this nature that you’ll want to implement use indices, which are inevitably going to start at 1. So you’ll still got plenty of hours of unpleasant debugging ahead of you, and a non-obvious correspondence to the original paper at the end of it.

100. comex ◴[30 Jun 25 22:04 UTC] No.44428393{7}[source]▶

>>44427912 #

Probably >50% of exploits these days target use-after-frees, not buffer overflows. I don’t have hard data though.

As for null pointer problems, while they may result in CVEs, they’re a pretty minor security concern since they generally only result in denial of service.

Edit 2: Here's some data: In an analysis by Google, the "most frequently exploited" vulnerability types for zero-day exploitation were use-after-free, command injection, and XSS [3]. Since command injection and XSS are not memory-unsafety vulnerabilities, that implies that use-after-frees are significantly more frequently exploited than other types of memory unsafety.

Edit: Zig previously had a GeneralPurposeAllocator that prevented use-after-frees of heap allocations by never reusing addresses. But apparently, four months ago [1], GeneralPurposeAllocator was renamed to DebugAllocator and a comment was added saying that the safety features "require the allocator to be quite slow and wasteful". No explicit reasoning was given for this change, but it seems to me like a concession that applications need high performance generally shouldn't be using this type of allocator. In addition, it appears that use-after-free is not caught for stack allocations [2], or allocations from some other types of allocators.

Note that almost the entire purpose of Rust's borrow checker is to prevent use-after-free. And the rest of its purpose is to prevent other issues that Zig also doesn't protect against: tagged-union type confusion and data races.

[1] https://github.com/ziglang/zig/commit/cd99ab32294a3c22f09615...

[2] https://github.com/ziglang/zig/issues/3180.

[3] https://cloud.google.com/blog/topics/threat-intelligence/202...

replies(1): >>44429957 #

101. kevincox ◴[30 Jun 25 22:14 UTC] No.44428481{5}[source]▶

>>44428235 #

I find math far easier to read when the authors use proper names for variables. But I understand that it isn't the idiomatic style and agree that it can be useful to match the paper when re-implementing an algorithm.

102. f33d5173 ◴[30 Jun 25 22:20 UTC] No.44428541[source]▶

>>44428380 #

The compiler knows about malloc, and hence knows that the pointer returned by malloc won't alias any other pointer. Your compiler might support some attribute to mark a function as behaving like malloc in this respect. Otherwise the compiler will be forced to assume the return value could alias any other pointer.

replies(1): >>44435458 #

103. flohofwoe ◴[30 Jun 25 22:27 UTC] No.44428607{3}[source]▶

>>44426467 #

Which is fine I think, considering that union type punning is legal in C (and even in C++ where union type punning is UB I have never seen it break - theoretically it might of course).

104. flohofwoe ◴[30 Jun 25 22:35 UTC] No.44428659[source]▶

>>44424859 #

It has a slightly different meaning now, instead of hinting to the compiler that the variable should be placed in a register it now means that it is illegal to take the address of the variable (e.g. cannot create a pointer from it):

https://www.godbolt.org/z/eEYf5c59f

Might be useful in some situations although I currently can't think of any :)

replies(1): >>44429293 #

105. flohofwoe ◴[30 Jun 25 22:45 UTC] No.44428726{4}[source]▶

>>44426182 #

> Isn't that basically all C/C++ code?

Maybe for code that was written in the early 90's, but the only 'tradition' that has survived is calling the vanilla loop variable 'i'.

106. ◴[30 Jun 25 22:51 UTC] No.44428768{3}[source]▶

>>44425323 #

107. ben0x539 ◴[30 Jun 25 23:24 UTC] No.44428989[source]▶

>>44427669 #

Can you say more about what the consensus is that this is going against?

replies(1): >>44437131 #

108. eqvinox ◴[01 Jul 25 00:11 UTC] No.44429293{3}[source]▶

>>44428659 #

I mean, yeah, but that function is really only an aid for the programmer in self-enforcing that rule; the compiler already knows whether the address of the variable is taken anywhere, and behave as is useful if it isn't taken anywhere…

Doesn't feel particularly valuable to have that "help" from the compiler against "accidentally" taking the address of a variable… I mean, how do you even accidentally do that?

replies(1): >>44431652 #

109. nixpulvis ◴[01 Jul 25 01:37 UTC] No.44429750[source]▶

>>44421185 (OP) #

As a bit of an aside, the example XOR doubly linked list example given here is super cool.

110. dnautics ◴[01 Jul 25 02:14 UTC] No.44429957{8}[source]▶

>>44428393 #

yeah I don't think the GPA is really a great strategy for detecting UAF, but it was a good try. It basically creates a new virtual page for each allocation, so the kernel gets involved and ?I think? there is more indirection for any given pointer access. So you can imagine why it wasn't great.

Anyways, I am optimistic that UAF can be prevented by static analysis:

https://www.youtube.com/watch?v=ZY_Z-aGbYm8

Note since this sort of technique interfaces with the compiler, unless the dependency is in a .so file, it will detect UAF in dependencies too, whether or not the dependency chooses to run the static analysis as part of their software quality control.

replies(1): >>44436348 #

111. cryptonector ◴[01 Jul 25 03:47 UTC] No.44430407[source]▶

>>44421185 (OP) #

:thank you:

This is great. I wonder what u/pizlonator thinks of it.

112. cryptonector ◴[01 Jul 25 03:51 UTC] No.44430422{4}[source]▶

>>44427476 #

A big stumbling block is that Fil-C requires all C in the program to be built with Fil-C, including all libraries. That means that Debian and such would need to either adopt Fil-C (perhaps for some distros) or ship Fil-C and non-Fil-C libraries for all pkgs with libraries. The alternative is that you have to build everything yourself, and this gets painful if you need to support ELFs/DLLs.

113. cryptonector ◴[01 Jul 25 04:50 UTC] No.44430684{5}[source]▶

>>44423210 #

> C has all the things to hate in a programming language

> CaSe Sensitivity

Wait, what, you.. you want a case-insensitive language? Like SQL?

I love SQL, but please no more case-insensitive programming languages!

114. cryptonector ◴[01 Jul 25 04:57 UTC] No.44430717[source]▶

>>44426876 #

There are some grammar errors here and there, but TFA is very nice. Thank you for your hard work!

115. uecker ◴[01 Jul 25 05:21 UTC] No.44430826[source]▶

>>44422412 #

The problem might be that Clang does not even implement type-based aliasing correctly. So I assume it checks its broken rules, instead of the one specified in the C standard.

116. account42 ◴[01 Jul 25 07:43 UTC] No.44431566{8}[source]▶

>>44424740 #

And those are not programming languages, or at least not the C programming language which only needs a very limited character set.

replies(1): >>44436431 #

117. flohofwoe ◴[01 Jul 25 07:58 UTC] No.44431652{4}[source]▶

>>44429293 #

I guess you're not a fan of 'const' either? ;)

replies(1): >>44433424 #

118. kzrdude ◴[01 Jul 25 09:19 UTC] No.44432063{8}[source]▶

>>44424558 #

To be sure, it's a joke. Mostly trying to joke at the expense of these excessively complicated variable names (that are only there because it's pseudocode) :)

And yeah, the chinese tone in practice does not align with the idea of "down a little up a lot" either. It depends on context...

119. alextingle ◴[01 Jul 25 09:27 UTC] No.44432092[source]▶

>>44427669 #

I don't imagine that the exposed state would need to be represented in the final compiler output, so the optimiser could mark the pointer as exposed, but still eliminate the dead integer load.

Or from a pragmatic viewpoint, perhaps if the optimiser eliminates a dead load, then don't mark the pointer as exposed? After all, the whole point is to keep track of whether a synthesised pointer might potentially refer to the exposed pointer's storage. There's zero danger of that happening if the integer load never actually occurs.

replies(1): >>44437878 #

120. Measter ◴[01 Jul 25 12:45 UTC] No.44433373[source]▶

>>44421185 (OP) #

In the section about the ambiguous provenance from synthesising pointers, it's explained that the compiler will infer the correct provenance from usage. Would it not be worth having some way for the programmer to inform the compiler directly, with something analogous to Rust's Strict Provenance ptr::with_addr?

To convert it to C syntax, it's a function with roughly this signature:

    void* with_addr(void* ptr, uintptr_t addr)

Where the returned pointer has the address of `addr` and the provenance of `ptr`.

replies(3): >>44434377 #>>44435332 #>>44438710 #

121. eqvinox ◴[01 Jul 25 12:52 UTC] No.44433424{5}[source]▶

>>44431652 #

I am a fan of "const", because it is useful in expressing API constraints and behavior. Contrast against putting "register" on a function parameter being useless because either it's passed by value, then it's a copy anyway and register is meaningless to the caller, or it's a pointer, in which case it again does nothing because you already have a pointer (and something somewhere is very confused.)

122. bmn__ ◴[01 Jul 25 12:59 UTC] No.44433479[source]▶

>>44422693 #

https://github.com/tsoding/crust

123. charleslmunger ◴[01 Jul 25 14:43 UTC] No.44434377[source]▶

>>44433373 #

This is doable via this trick:

https://github.com/protocolbuffers/protobuf/blob/ae0129fcd01...

124. cryptonector ◴[01 Jul 25 16:03 UTC] No.44435280{9}[source]▶

>>44426840 #

The language and compiler & linker should reject Zalgo in identifiers, and they should reject confusable script mixes in identifiers, but otherwise they treat all equivalent strings as equivalent. To make it easier on the linker compilers should normalize all symbols to one common form (e.g., NFC).

125. cryptonector ◴[01 Jul 25 16:08 UTC] No.44435332[source]▶

>>44433373 #

I'd also like to have builtin functions and/or function attributes for designating allocation and deallocation. malloc() and free() (and realloc()) should not be special because of their names -- they should be special because of their declared attributes or their derived attributes given their internals.

126. cryptonector ◴[01 Jul 25 16:15 UTC] No.44435410{4}[source]▶

>>44425432 #

Yes, I also think the whole word should program in English.

That's half tongue in cheek. I am fluent in three languages, but I program "in English" and I greatly appreciate that my colleagues who are fluent in languages other than the ones I'm fluent in (except English) also do. Basically English is the world's lingua franca today. Nonetheless if a company in France wants to use French for their symbol names, or a company in Mexico wants to use Spanish for their symbol names, or a company in China wants to use Chinese for their symbol names, who am I to stop them?! Surely it's not my place.

127. cryptonector ◴[01 Jul 25 16:17 UTC] No.44435421{4}[source]▶

>>44425239 #

> […could some of the downvoters explain why they're downvoting?]

Because you made false assertions ("And you pretty much can't type them on any keyboard").

replies(1): >>44436104 #

128. cryptonector ◴[01 Jul 25 16:20 UTC] No.44435458{3}[source]▶

>>44428541 #

IMO there should be attributes for declaring allocators. Or builtin functions that have the effect of marking their callers with such attributes (e.g., an `__allocated()` function to say a pointer is indeed now to be considered a pointer to a new storage allocation, with a given size and optional type, and a `__freed()` function to say that a pointer is indeed now to be considered a dangling pointer to a deallocated object.

129. eqvinox ◴[01 Jul 25 17:20 UTC] No.44436104{5}[source]▶

>>44435421 #

Please show me the keyboard layout that has keys for ⁺, ř and ₚ.

(Unless you're being pedantic because I wrote "keyboard" rather than "keyboard layout", or ignored the qualifying "pretty much". In either of those cases you're unwilling to communicate cooperatively and I can't help you.)

replies(1): >>44438635 #

130. comex ◴[01 Jul 25 17:47 UTC] No.44436348{9}[source]▶

>>44429957 #

Fair enough. In some sense you’re writing your own borrow checker. But (you may know this already) be warned: this has been tried many times for C++, with different levels of annotation burden imposed on programmers.

On one side are the many C++ “static analyzers” like Coverity or clang-analyzer, which work with unannotated C++ code. On the other side is the “Safe C++” proposal (safecpp.org), which is supposed to achieve full safety, but at the cost of basically transplanting Rust’s type system into C++, requiring all functions to have lifetime annotations and disallow mutable aliasing, and replacing the entire standard library with a new one that follows those rules. Between those two extremes there have been tools like the C++ Core Guidelines Checker and Clang’s lifetimebound attribute, which require some level of annotations, and in turn provide some level of checking.

So far, none of these have been particularly successful in preventing memory safety vulnerabilities. Static analyzers are widely used in industry but only find a fraction of bugs. Safe C++ will probably be too unpopular to make it into the spec. The intermediate solutions have some fundamental issues (see [1], though it’s written by the author of Safe C++ and may be biased), and in practice haven’t really taken off.

But I admit that only the “static analyzer” side of the solution space has been extensively explored. The other projects are just experiments whose lack of adoption may be due to inertia as much as inherent lack of merit.

And Zig may be different… I’m not a Zig programmer, but I have the impression that compared to C++ it encourages fewer allocations and smaller codebases, both of which may make lifetime analysis more tractable. It’s also a much younger language whose audience is necessarily much more open to change.

So we’ll see. Good luck - I’d sure like to see more low-level languages offering memory safety.

[1] https://www.circle-lang.org/draft-profiles.html

replies(3): >>44436441 #>>44436589 #>>44465759 #

131. steveklabnik ◴[01 Jul 25 17:54 UTC] No.44436431{9}[source]▶

>>44431566 #

C does allow for limited unicode in identifiers, though you need to use the \u prefix and write the code out. Compilers like clang let it work like C++ and follow TR31, though this is nonstandard.

replies(1): >>44441249 #

132. steveklabnik ◴[01 Jul 25 17:55 UTC] No.44436441{10}[source]▶

>>44436348 #

> Safe C++ will probably be too unpopular to make it into the spec.

Not just that, but the committee accepted a paper that basically says it's design is against C++'s design principles, so it's effectively dead forever.

replies(1): >>44437031 #

133. tialaramex ◴[01 Jul 25 18:09 UTC] No.44436589{10}[source]▶

>>44436348 #

One of the key things in Sean's "Safe C++" is that, like Rust, it actually technically works. If we write software in the safe C++ dialect we get safe programs just as if we write ordinary safe (rather than ever invoking "unsafe") Rust we get safe programs. WG21 didn't take Safe C++ and it will most likely now be a minor footnote in history, but it did really work.

"I think this could be possible" isn't an enabling technology. If you write hard SF it's maybe useful to distinguish things which could happen from those which can't, but for practical purposes it only matters if you actually did it. Sean's proposed "Safe C++" did it, Zig, today, did not.

There are other obstacles - like adoption, as we saw for "Safe C++" - but they're predicated on having the technology at all, you cannot adopt technologies which don't exist, that's just make believe. Which I think is already the path WG21 has set out on.

134. tialaramex ◴[01 Jul 25 19:03 UTC] No.44437031{11}[source]▶

>>44436441 #

This was adopted as standing document SD-10 https://isocpp.org/std/standing-documents/sd-10-language-evo...

Here's somebody who was in the room explaining how this was agreed as standing policy for the C++ programming language.

"It was literally the last paper. Seen at the last hour. Of a really long week. Most everyone was elsewhere in other working group meetings assuming no meaningful work was going to happen."

135. nikic ◴[01 Jul 25 19:14 UTC] No.44437131{3}[source]▶

>>44428989 #

That type punning through memory does not expose or synthesize memory. There are some possible variations on this, but the most straightforward is that pointer to integer transmutes just return the address (without exposure) and integer to pointer transmutes return a pointer with nullary provenance.

136. nikic ◴[01 Jul 25 19:52 UTC] No.44437436{3}[source]▶

>>44428359 #

That's a great point. I initially thought we could assume no exposure for loads with non-pointer-compatible TBAA, but you are right that this is not correct if the memory has been laundered through memcpy.

replies(1): >>44440646 #

137. Hercuros ◴[01 Jul 25 20:48 UTC] No.44437878{3}[source]▶

>>44432092 #

I guess the internal exposure state would be “wrong” if the compiler removes the dead load (e.g in a pass that runs before provenance analysis).

However, if all of the program paths from that point onward behave the same as if the pointer was marked as exposed, that would be fine. It’s only “wrong” to track the incorrect abstract machine state when that would lead to a different behaviour in the abstract machine.

In that sense I suppose it’s no different from things like removing a variable initialisation if the variable is never used. That also has a side effect in the abstract machine, but it can still be optimised out if that abstract machine side effect is not observable.

138. cryptonector ◴[01 Jul 25 22:46 UTC] No.44438635{6}[source]▶

>>44436104 #

Search for compose key sequences.

replies(1): >>44438766 #

139. uecker ◴[01 Jul 25 22:56 UTC] No.44438710[source]▶

>>44433373 #

The proposal is mostly designed this way to make sure existing code is valid. One could add something "with_addr", but I am not convinced that it is really worth it.

140. eqvinox ◴[01 Jul 25 23:05 UTC] No.44438766{7}[source]▶

>>44438635 #

> Search for compose key sequences.

I don't need to do that because I actively use them myself and have a custom ~/.XCompose. Also, please try communicating less condescendingly.

There is no default compose sequence for ₚ that I can find, at least in my Debian installation.

So, again, please point me at the layout that can output these characters.

And even with that: if you don't think Compose sequences, possibly even custom, are covered by "pretty much impossible", I must seriously question your perception & bias of how common (or not) things are.

141. uecker ◴[02 Jul 25 06:02 UTC] No.44440646{4}[source]▶

>>44437436 #

You can still eliminate the memcpy of if you mark the pointer exposed at this point.

142. account42 ◴[02 Jul 25 08:14 UTC] No.44441249{10}[source]▶

>>44436431 #

Yes, these are the relatively recent additions being discussed here. C and C++ managed just fine for ages without them before the committees decided that scoring brownie points with performative changes was more important than security and readability of source files.

143. lioeters ◴[02 Jul 25 12:00 UTC] No.44442670{4}[source]▶

>>44425842 #

I was curious about this too, and found some discussion related to the topic of "bloated C" when the 3rd edition was announced.

The C23 edition of Modern C - https://news.ycombinator.com/item?id=41850017

Like this comment:

> Wow, the use of attributes like [[__unsequenced__]], [[maybe_unused]] and [[noreturn]] throughout the book is really awful. It seems pretty pedantic of the author to litter all the code examples with something that is mostly optional.

Or this one:

> Personally this just makes C much more complicated for me, and I choose C when I want simplicity. If I want complicated, I would just pick C++ which I typically would never want.

Examples of what people consider "bloat" in newer C standards:

    _BitInt(N), guard, defer, auto, constexpr, nullptr

    _generic, typeof, restrict, syntax based tls

replies(1): >>44449104 #

144. johnisgood ◴[02 Jul 25 21:35 UTC] No.44449104{5}[source]▶

>>44442670 #

Yeah, I have a comment that I cannot find right now, they mention many of these things as well. They are all C++-esque, i.e. bloat, in my opinion.

Edit: oh, you actually did quote me, too: https://news.ycombinator.com/item?id=41854897

In any case, thank you.

145. dnautics ◴[04 Jul 25 16:13 UTC] No.44465759{10}[source]▶

>>44436348 #

> Good luck

Thanks! I think this could be implemented as a (3rd party?) compiler backend.

And yeah, if it gets done quickly enough (before 1.0?) it could get enough momentum that it gets accepted as "considered to be best practice".

Honestly, though, I think the big hurdle for C/C++ static analysis is that lots of dependencies get shipped around as .so's and once that happens it's sort of a black hole unless 1) the dependency's provider agrees to run the analysis or 2) you can easily shim to annotate what's going on in the library's headers. 2) is a pain in the ass, and begging for 1) can piss off the dependency's owner.

↑