Most active commenters
  • eqvinox(4)
  • OkayPhysicist(4)
  • cryptonector(3)

←back to thread

The provenance memory model for C

(gustedt.wordpress.com)
224 points HexDecOctBin | 33 comments | | HN request time: 1.018s | source | bottom
1. smcameron ◴[] No.44424882[source]
Ugh. Are unicode variable names allowed in C now? That's horrific.
replies(5): >>44424985 #>>44425020 #>>44425869 #>>44426140 #>>44426336 #
2. mananaysiempre ◴[] No.44424985[source]
“Now” as in since C99, twenty-five years ago, yes. (It seemed like a good idea at the time.)
replies(2): >>44425080 #>>44425758 #
3. 1over137 ◴[] No.44425020[source]
Horrific? You might not think so if your (human) language used a different alphabet.
replies(3): >>44425223 #>>44425239 #>>44425432 #
4. 90s_dev ◴[] No.44425080[source]
See also https://www.ethiocloud.com/bunnascript.aspx and https://en.wikipedia.org/wiki/Non-English-based_programming_...
5. ajross ◴[] No.44425223[source]
Little to no source code is written for single (human) language development teams. Sure, everyone would like the ability to write source code in their native language. That's natural.

Literally no one, anywhere, wants to be forced to read source written in a language they can't read (or more specifically in this case: written in glyphs they can't even produce on their keyboard). That idea, for almost everyone, seems "horrific", yeah.

So a lingua franca is a firm requirement for modern software development outside of extremely specific environments (FSB malware authors probably don't care about anyone else reading their cyrillic variable names, etc...). Must it be ASCII-encoded English? No. But that's what the market has picked and most people seem happy enough with it.

replies(1): >>44425925 #
6. eqvinox ◴[] No.44425239[source]
Yes but also no. The thing about software is that 90% of it is not culturally bound. If you're writing, say, some tax reporting tool, a grammar reference, or something religious… sure, it makes sense to write that in your language. So, yeah, C should support that.

However, everything else, from spreadsheet software to CAD tools to OS kernels to JavaScript frameworks is universal across cultures and languages. And for better or for worse (I'm not a native English speaker either), the world has gone with English for a lot of code commons.

And the thing with the examples in that post isn't about supporting language diversity, it's math symbols which are noone's native language. And you pretty much can't type them on any keyboard. Which really makes it a rather poor flex IMHO. Did the author reconfigure their keyboard layout for that specific math use case? It can't generically cover "all of math" either. Or did they copy&paste it around? That's just silly.

[…could some of the downvoters explain why they're downvoting?]

replies(2): >>44426108 #>>44435421 #
7. Joker_vD ◴[] No.44425432[source]
My language uses Cyrillic and I personally prefer English-based keywords and variable names precisely because they are not words of my (human) language. It introduces an easy and obvious distinction between the machine-oriented and the human-oriented.
replies(2): >>44426725 #>>44435410 #
8. kevincox ◴[] No.44425758[source]
Being able to program in languages that don't fit into ASCII is a good idea. Using one-character variable names is a bad idea.
replies(2): >>44425908 #>>44428235 #
9. OkayPhysicist ◴[] No.44425869[source]
Why shouldn't they be? It's not the 00's anymore, Unicode support is universal. You'd have to dust off some truly ancient tech to find something incapable of rendering it.

Source code is for humans, and thus should be written in whatever way makes it easiest to read, write, and understand for humans. If your language doesn't map onto ASCII, then Unicode support improves that goal. If your code is meant to directly implement some physics formula, then using the appropriate unicode characters might make it easier to read (and thus spot transcription errors, something I find far too often in physics simulations).

replies(3): >>44425932 #>>44426021 #>>44426146 #
10. adrianN ◴[] No.44425908{3}[source]
Using variable names that are different but render (almost) the same can be a bad idea.
11. OkayPhysicist ◴[] No.44425925{3}[source]
> Little to no source code is written for single (human) language development teams.

This is blatantly false. I'd posit that a solid 90% of all source code written is done so by single, co-located teams (a substantial portion of which are teams of 1). That certainly fits the bill for most companies I've worked at.

12. wheybags ◴[] No.44425932[source]
Hot take, but I've always felt the world would be better served if mathematicians and physicists would stop using terrible short variable names and use longCamelCaseDescriptiveNames like the rest of us, because paper is cheap, and abbreviations are confusing. I know it's nicer when you're writing by hand, but when you clean up a proof or formula for publishing, would it really be so hard to switch to descriptive names?

I'm a practitioner of neither though, so I can't condemn the practice wholeheartedly as an outsider, but it does make me groan.

replies(2): >>44426036 #>>44426288 #
13. someplaceguy ◴[] No.44426021[source]
> using the appropriate unicode characters might make it easier to read

It's probably also a great way to introduce almost undetectable security vulnerabilities by using Unicode characters that look similar to each other but in fact are different.

replies(1): >>44426189 #
14. senbrow ◴[] No.44426036{3}[source]
Long names are good for short expressions, but they obfuscate complex ones because the identifiers visually crowd out the operators.

This can be especially difficult if the author is trying to map 1:1 to a complex algorithm in a white paper that uses domain-standard mathematical notation.

The alternative is to break the "full formula" into simpler expression chunks, but then naming those partial expression results descriptively can be even more challenging.

15. OkayPhysicist ◴[] No.44426108{3}[source]
When I was doing a lot of Physics simulation in Julia, I had a Vim extension which would just allow me to type something like \gamma, hit tab, and get γ. This was worth the (minimal) hassle, because it made it very easy to spot check formulas. When you're shuffling data around in a loosely-described space like most of web dev, descriptive function and variable names are important because the description of what you're doing and what you're doing it too is the important information, and the actual operations you're taking are typically approximately trivial.

In heavily mathematical contexts, most of those assumptions get turned on their head. Anybody qualified to be modifying a model of electromagnetism is going to be intimately familiar with the language of the formulas: mu for permeability, epsilon for permittivity, etc. With that shared context,

1/(4*π*ε)*(q_electron * q_proton)/r^2 is going to be a lot easier to see, at a glance, as Coulombs law

compared to

1 / (4 * Math.Pi * permitivity_of_free_space)*(charge_electron * charge_proton)/distance_of_separation

Source code, like any other language built for humans, is meant to be read by humans. If those humans have a shared context, utilizing that shared context improves the quality and ease of that communication.

replies(1): >>44426271 #
16. loeg ◴[] No.44426140[source]
Math people shouldn't be allowed to write code. It's not the unicode, so much as the extremely terse variable names.
replies(1): >>44426182 #
17. bigstrat2003 ◴[] No.44426146[source]
They shouldn't be precisely because it makes the code harder to read and write when you include non-ASCII characters.
18. perching_aix ◴[] No.44426182[source]
Isn't that basically all C/C++ code? Admittedly I don't have much exposure to it, but it's pretty much a trope in and of itself, along with Java and C# suffering from the opposite problem.

Such a silly issue too, you'd think we'd have come up with some automated wrangling for this, so that those experienced with a codebase can switch over and see super short versions of identifiers, while people new to it all will see the long stuff.

replies(1): >>44428726 #
19. OkayPhysicist ◴[] No.44426189{3}[source]
This would cause your compilation to fail, unless you were deliberately declaring and using near identical symbols. Which would violate the whole "Code is meant to be easily read by humans" thing.
replies(1): >>44426222 #
20. someplaceguy ◴[] No.44426222{4}[source]
> unless you were deliberately declaring and using near identical symbols.

Yes, that would probably be one way to do it.

> Which would violate the whole "Code is meant to be easily read by humans" thing.

I'd think someone who's deliberately and sneakily introducing a security vulnerability would want it to be undetectable, rather than easily readable.

21. eqvinox ◴[] No.44426271{4}[source]
Hrm. Fair point. But will the other humans, even if they have the shared context, also have the ability to type in these symbols, if they want to edit the code? They probably don't have your vim extension…

I guess maybe this is an argument for better UI/UX for symbolic input…

22. nsingh2 ◴[] No.44426288{3}[source]
Better served to students and those unfamiliar with the field, but noisy to those familiar. Considering that much of mathematical work is done using pen/paper, it would be a total pain to write out huge variable names every time.

Consider a simple programming example, in C blocks are delimited by `{}`, why not use `block_begin` and `block_end`? Because it's noisy, and it doesn't take much to internalize the meaning of braces.

23. SV_BubbleTime ◴[] No.44426336[source]
> void recip(double* aₚ, double* řₚ) > { > for (;;) > { > register double Π = (aₚ)(řₚ);

My first thought before I saw this was “I wonder is this going to be an article from people who build things or something from “academics” that don’t.”

At least it was answered quickly.

24. ZoomZoomZoom ◴[] No.44426725{3}[source]
I know what you mean and I shudder when I see code that uses words from my native lang, but most code is human-oriented.
25. RossBencina ◴[] No.44428235{3}[source]
Mathematics is a language that doesn't fit into ASCII and commonly uses one-character variable names. If you are implementing a documented mathematical algorithm (i.e. one with a description in a paper or book) then sticking to the notation of the paper (i.e. using one character variable names) makes sense to me.
replies(2): >>44428392 #>>44428481 #
26. mananaysiempre ◴[] No.44428392{4}[source]
Unfortunately, many of the things of this nature that you’ll want to implement use indices, which are inevitably going to start at 1. So you’ll still got plenty of hours of unpleasant debugging ahead of you, and a non-obvious correspondence to the original paper at the end of it.
27. kevincox ◴[] No.44428481{4}[source]
I find math far easier to read when the authors use proper names for variables. But I understand that it isn't the idiomatic style and agree that it can be useful to match the paper when re-implementing an algorithm.
28. flohofwoe ◴[] No.44428726{3}[source]
> Isn't that basically all C/C++ code?

Maybe for code that was written in the early 90's, but the only 'tradition' that has survived is calling the vanilla loop variable 'i'.

29. cryptonector ◴[] No.44435410{3}[source]
Yes, I also think the whole word should program in English.

That's half tongue in cheek. I am fluent in three languages, but I program "in English" and I greatly appreciate that my colleagues who are fluent in languages other than the ones I'm fluent in (except English) also do. Basically English is the world's lingua franca today. Nonetheless if a company in France wants to use French for their symbol names, or a company in Mexico wants to use Spanish for their symbol names, or a company in China wants to use Chinese for their symbol names, who am I to stop them?! Surely it's not my place.

30. cryptonector ◴[] No.44435421{3}[source]
> […could some of the downvoters explain why they're downvoting?]

Because you made false assertions ("And you pretty much can't type them on any keyboard").

replies(1): >>44436104 #
31. eqvinox ◴[] No.44436104{4}[source]
Please show me the keyboard layout that has keys for ⁺, ř and ₚ.

(Unless you're being pedantic because I wrote "keyboard" rather than "keyboard layout", or ignored the qualifying "pretty much". In either of those cases you're unwilling to communicate cooperatively and I can't help you.)

replies(1): >>44438635 #
32. cryptonector ◴[] No.44438635{5}[source]
Search for compose key sequences.
replies(1): >>44438766 #
33. eqvinox ◴[] No.44438766{6}[source]
> Search for compose key sequences.

I don't need to do that because I actively use them myself and have a custom ~/.XCompose. Also, please try communicating less condescendingly.

There is no default compose sequence for ₚ that I can find, at least in my Debian installation.

So, again, please point me at the layout that can output these characters.

And even with that: if you don't think Compose sequences, possibly even custom, are covered by "pretty much impossible", I must seriously question your perception & bias of how common (or not) things are.