←back to thread

The provenance memory model for C

(gustedt.wordpress.com)
225 points HexDecOctBin | 1 comments | | HN request time: 0.367s | source
Show context
zombot ◴[] No.44422335[source]
Does C allow Unicode identifiers now, or is that pseudo code? The code snippets also contain `&`, so something definitely went wrong with the transcoding to HTML.
replies(4): >>44422382 #>>44422416 #>>44422634 #>>44424896 #
qsort ◴[] No.44422382[source]
Quoting cppreference:

An identifier is an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and Unicode characters specified using \u and \U escape notation(since C99), of class XID_Continue(since C23). A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode non-digit character(since C99)(until C23), or Unicode character of class XID_Start)(since C23)). Identifiers are case-sensitive (lowercase and uppercase letters are distinct). Every identifier must conform to Normalization Form C.(since C23)

In practice depends on the compiler.

replies(1): >>44422453 #
dgrunwald ◴[] No.44422453[source]
But the source character set remains implementation-defined, so compilers do not have to directly support unicode names, only the escape notation.

Definitely a questionable choice to throw off readers with unicode weirdness in the very first code example.

replies(1): >>44422534 #
qsort ◴[] No.44422534[source]
If it were up to me, anything outside the basic character set in a source file would be a syntax error, I'm simply reporting what the spec says.
replies(2): >>44422647 #>>44423260 #
guipsp ◴[] No.44423260[source]
What a "basic character set" is depends on locale
replies(2): >>44423859 #>>44424381 #
account42 ◴[] No.44424381[source]
Anything except US-ASCII in source code outside comments and string constants should be a syntax error.
replies(1): >>44424740 #
guipsp ◴[] No.44424740[source]
You are aware other languages exist? Some of which don't even use the Latin script?
replies(3): >>44424911 #>>44426840 #>>44431566 #
account42 ◴[] No.44431566[source]
And those are not programming languages, or at least not the C programming language which only needs a very limited character set.
replies(1): >>44436431 #
steveklabnik ◴[] No.44436431[source]
C does allow for limited unicode in identifiers, though you need to use the \u prefix and write the code out. Compilers like clang let it work like C++ and follow TR31, though this is nonstandard.
replies(1): >>44441249 #
1. account42 ◴[] No.44441249[source]
Yes, these are the relatively recent additions being discussed here. C and C++ managed just fine for ages without them before the committees decided that scoring brownie points with performative changes was more important than security and readability of source files.