←back to thread

55 points anqurvanillapy | 2 comments | | HN request time: 0.653s | source

Hi I'm Anqur, a senior software engineer with different backgrounds where development in C was often an important part of my work. E.g.

1) Game: A Chinese/Vietnam game with C/C++ for making server/client, Lua for scripting [1]. 2) Embedded systems: Switch/router with network stack all written in C [2]. 3) (Networked) file system: Ceph FS client, which is a kernel module. [3]

(I left some unnecessary details in links, but are true projects I used to work on.)

Recently, there's a hot topic about Rust and C in kernel and a message [4] just draws my attention, where it talks about the "Rust" experiment in kernel development:

> I'd like to understand what the goal of this Rust "experiment" is: If we want to fix existing issues with memory safety we need to do that for existing code and find ways to retrofit it.

So for many years, I keep thinking about having a new C dialect for retrofitting the problems, but of C itself.

Sometimes big systems and software (e.g. OS, browsers, databases) could be made entirely in different languages like C++, Rust, D, Zig, etc. But typically, like I slightly mentioned above, making a good filesystem client requires one to write kernel modules (i.e. to provide a VFS implementation. I do know FUSE, but I believe it's better if one could use VFS directly), it's not always feasible to switch languages.

And I still love C, for its unique "bare-bone" experience:

1) Just talk to the platform, almost all the platforms speak C. Nothing like Rust's PAL (platform-agnostic layer) is needed. 2) Just talk to other languages, C is the lingua franca (except Go needs no libc by default). Not to mention if I want WebAssembly to talk to Rust, `extern "C"` is need in Rust code. 3) Just a libc, widely available, write my own data structures carefully. Since usually one is writing some critical components of a bigger system in C, it's just okay there are not many choices of existing libraries to use. 4) I don't need an over-generalized generics functionality, use of generics is quite limited.

So unlike a few `unsafe` in a safe Rust, I want something like a few "safe" in an ambient "unsafe" C dialect. But I'm not saying "unsafe" is good or bad, I'm saying that "don't talk about unsafe vs safe", it's C itself, you wouldn't say anything is "safe" or "unsafe" in C.

Actually I'm also an expert on implementing advanced type systems, some of my works include:

1) A row-polymorphic JavaScript dialect [5]. 2) A tiny theorem prover with Lean 4 syntax in less than 1K LOC [6]. 3) A Rust dialect with reuse analysis [7].

Language features like generics, compile-time eval, trait/typeclass, bidirectional typechecking are trivial for me, I successfully implemented them above.

For the retrofitted C, these features initially come to my mind:

1) Code generation directly to C, no LLVM IR, no machine code. 2) Module, like C++20 module, to eliminate use of headers. 3) Compile-time eval, type-level computation, like `malloc(int)` is actually a thing. 4) Tactics-like metaprogramming to generate definitions, acting like type-safe macros. 5) Quantitative types [8] to track the use of resources (pointers, FDs). The typechecker tells the user how to insert `free` in all possible positions, don't do anything like RAII. 6) Limited lifetime checking, but some people tells me lifetime is not needed in such a language.

Any further insights? Shall I kickstart such project? Please I need your ideas very much.

[1]: https://vi.wikipedia.org/wiki/V%C3%B5_L%C3%A2m_Truy%E1%BB%81...

[2]: https://e.huawei.com/en/products/optical-access/ma5800

[3]: https://docs.ceph.com/en/reef/cephfs/

[4]: https://lore.kernel.org/rust-for-linux/Z7SwcnUzjZYfuJ4-@infr...

[5]: https://github.com/rowscript/rowscript

[6]: https://github.com/anqurvanillapy/TinyLean

[7]: https://github.com/SchrodingerZhu/reussir-lang

[8]: https://bentnib.org/quantitative-type-theory.html

Show context
sparkie ◴[] No.43180578[source]
The problem with existing attempts to fix C, like Cyclone, are they're creating a new language, but what we really want is C, with improvements. The approach should not be to make a new language, but to augment C with optional new features, which can be incrementally applied to existing codebases to improve them.

You should start with a plain old C compiler, and add the features you want in ways that fully preserve backward compatibility. Code written with these new features should compile with existing C compilers without changing any semantics, and not only your own compiler. Using an existing compiler rather than yours would just mean they're not taking advantage of the features you add.

To give an example, lets say you want to augment pointers with some kind of ownership semantics that your compiler can statically check. We can add some type qualifiers in place of `restrict`.

    void * _Owned foo;
    void * _Shared foo;
We could make `_Owned` and `_Shared` keywords in the dialect compiled by your compiler, but we need the code to still work with an existing compiler. To fix this we can simply define them the mean nothing.

    #if defined(__MY_DIALECT__)
    #define _Owned __my_dialect_owned
    #define _Shared __my_dialect_shared
    #else
    #define _Owned 
    #define _Shared
    #endif
Now when you compile with your compiler, it can be checked that you are not performing use-after-move, but if you're compiling with an existing compiler, the code will still compile, but the checks will not be done.

An alternative syntactic representation from the above could use `[[attributes]]` which are now part of the C standard, but attributes can only appear in certain places, whereas symbols defined by the preprocessor can appear anywhere.

---

An example of good retrofitting is C#'s adding of non-nullable reference types. Using non-nullabiliy is optional, but can be made the default. When not enabled globally they can be used explicitly with `X!`. We can gradually annotate existing codebases to use non-nullable references, and then once we have updated the full translation unit we can enable them by default globally, so that `X` means `X!` instead of `X?`. The approach lets us gradually improve a codebase without having to rewrite it to use the new feature all at once.

Contrast this to Cyclone, which required you update the full translation unit for the Cyclone compiler to utilize non-nullable types.

If we were to add non-nullable pointers to C, we could take an approach like the above, where we have `void * _Nullable` and `void * _Notnull`, with the default setting for a translation unit provided with a `#pragma` - meaning `void *` without any annotation would default to nullable, but when the pragma is set, they become not-null by default. If, eventually you convert a whole codebase to using non-nullable pointers, you could enable it globally with a compiler switch and omit the pragmas, and from that point onward you would have to explicitly mark pointers that may be null with `_Nullable`.

---

An additional advantage of approaching it this way is that you can focus on the front-end facing features and leave the optimization to an existing compiler.

IMO this is the only sane approach to retrofit C. You need to be a compatible superset of C. You also need to have ABI compatibility because C is the lingua-franca for other languages to communicate with the OS.

I also think the C committee should stop trying to add new features into the standard until they've been proven in practice. While many of the proposals[1], such as to clean up various parts of the specification (Slay some earthly demons), are worthwhile, there are some contributors who propose adding X, Y, Z, without an actual implementation of them that can be experimented with, like they're competing with each other to get their pet feature into the standard.

What would be ideal would be if we could take some C26 code and compile it with a C23 compiler, because they added features in ways like the above, where they give additional meaning to the new compiler, but are just annotations that perform no function when compiled with an old compiler.

New features should be implemented and utilized before being considered for standardization. Let various ideas compete and let the best ones win, because prematurely adding features just piles more and more technical debt into the language, and makes it more difficult to add improvements further down the line.

[1]:https://www.open-std.org/jtc1/sc22/wg14/www/docs/?C=M;O=D

replies(1): >>43185897 #
1. anqurvanillapy ◴[] No.43185897[source]
I love your gradual approach pretty much. It sounds like gradual typing but not just the typing part.

I used to make many tools with libclang Python bindings to automate some chores of refactoring. I don't remember if one could expand the macros using it, and I was told that lexer and preprocessor are messed together in Clang. So it should be quite hard to extend the existing framework.

I would definitely go into this direction, but for now it just looks like some final boss, let me finish some early quests.

replies(1): >>43187708 #
2. sparkie ◴[] No.43187708[source]
I'd recommend using goblint (https://github.com/goblint) as a starting point, rather than Clang.