←back to thread

55 points anqurvanillapy | 1 comments | | HN request time: 0.339s | source

Hi I'm Anqur, a senior software engineer with different backgrounds where development in C was often an important part of my work. E.g.

1) Game: A Chinese/Vietnam game with C/C++ for making server/client, Lua for scripting [1]. 2) Embedded systems: Switch/router with network stack all written in C [2]. 3) (Networked) file system: Ceph FS client, which is a kernel module. [3]

(I left some unnecessary details in links, but are true projects I used to work on.)

Recently, there's a hot topic about Rust and C in kernel and a message [4] just draws my attention, where it talks about the "Rust" experiment in kernel development:

> I'd like to understand what the goal of this Rust "experiment" is: If we want to fix existing issues with memory safety we need to do that for existing code and find ways to retrofit it.

So for many years, I keep thinking about having a new C dialect for retrofitting the problems, but of C itself.

Sometimes big systems and software (e.g. OS, browsers, databases) could be made entirely in different languages like C++, Rust, D, Zig, etc. But typically, like I slightly mentioned above, making a good filesystem client requires one to write kernel modules (i.e. to provide a VFS implementation. I do know FUSE, but I believe it's better if one could use VFS directly), it's not always feasible to switch languages.

And I still love C, for its unique "bare-bone" experience:

1) Just talk to the platform, almost all the platforms speak C. Nothing like Rust's PAL (platform-agnostic layer) is needed. 2) Just talk to other languages, C is the lingua franca (except Go needs no libc by default). Not to mention if I want WebAssembly to talk to Rust, `extern "C"` is need in Rust code. 3) Just a libc, widely available, write my own data structures carefully. Since usually one is writing some critical components of a bigger system in C, it's just okay there are not many choices of existing libraries to use. 4) I don't need an over-generalized generics functionality, use of generics is quite limited.

So unlike a few `unsafe` in a safe Rust, I want something like a few "safe" in an ambient "unsafe" C dialect. But I'm not saying "unsafe" is good or bad, I'm saying that "don't talk about unsafe vs safe", it's C itself, you wouldn't say anything is "safe" or "unsafe" in C.

Actually I'm also an expert on implementing advanced type systems, some of my works include:

1) A row-polymorphic JavaScript dialect [5]. 2) A tiny theorem prover with Lean 4 syntax in less than 1K LOC [6]. 3) A Rust dialect with reuse analysis [7].

Language features like generics, compile-time eval, trait/typeclass, bidirectional typechecking are trivial for me, I successfully implemented them above.

For the retrofitted C, these features initially come to my mind:

1) Code generation directly to C, no LLVM IR, no machine code. 2) Module, like C++20 module, to eliminate use of headers. 3) Compile-time eval, type-level computation, like `malloc(int)` is actually a thing. 4) Tactics-like metaprogramming to generate definitions, acting like type-safe macros. 5) Quantitative types [8] to track the use of resources (pointers, FDs). The typechecker tells the user how to insert `free` in all possible positions, don't do anything like RAII. 6) Limited lifetime checking, but some people tells me lifetime is not needed in such a language.

Any further insights? Shall I kickstart such project? Please I need your ideas very much.

[1]: https://vi.wikipedia.org/wiki/V%C3%B5_L%C3%A2m_Truy%E1%BB%81...

[2]: https://e.huawei.com/en/products/optical-access/ma5800

[3]: https://docs.ceph.com/en/reef/cephfs/

[4]: https://lore.kernel.org/rust-for-linux/Z7SwcnUzjZYfuJ4-@infr...

[5]: https://github.com/rowscript/rowscript

[6]: https://github.com/anqurvanillapy/TinyLean

[7]: https://github.com/SchrodingerZhu/reussir-lang

[8]: https://bentnib.org/quantitative-type-theory.html

Show context
ebiederm ◴[] No.43177333[source]
If the goal is something that can be used to improve existing C code, I have a few thoughts.

To get to memory safety with C:

- Add support for array bounds checking. Ideally with the compiler doing the heavy lifting and providing to itself that most runtime bounds checks are unnecessary.

- Implement trivial dependent types so the compiler can know about the array size field that is passed next to a pointer. AKA

void do_something(size_t size, entry_t ptr[size]);

- Enforce the restrict keyword. This is actually the tricky bit. I have some ideas for a language that is not C, but making it backwards compatible is beyond where I have gotten. My hint is separation logic.

- Allow types to change safely. So that free() can change the type of the pointer passed to it, to be a non-dereferencable pointer (whatever bits it has).

This is an idea from separation logic.

Allowing functions to change types of data safely could also be a safe solution to code that needs type punning today.

I think conceptually modules are great, but if your goal is source compatible changes that bring memory safety then something like modules is an unnecessary distraction.

Any changes that ultimately cannot be implemented in the default C compiler I don't think will be preferable to just rewriting the code in a more established language like Rust.

On the other hand I think we are in a local maxima with programming languages and type systems. With everyone busy recombining proven techniques in different ways instead of working on the hard problem of how to have assignment, threading, and memory safety. Plus how to do proofs of interesting program properties with things like asserts.

Unfortunately it appears that only through proof can programs be consistent enough that specific security concerns can be said to not be problems.

What I have seen of ADA Spark lately has been very tantalizing.

I have a personal project that I think I have solved the memory safety problem, while still allowing manual memory management and assignment. Unfortunately I am at a stage where everything is mostly clear in my head, but I haven't finished fleshing it out and proving the type system, so I really can't share it yet :-(.

While implementing modules, memory safety, type variables, and functions that can change the types of their argument pointers. I think I will end up with something simpler than C in most respects.

I keep going well that doesn't make any sense today, as I go through all of the details and ask why is something done the way it is done.

One of those questions is why doesn't C use modules.

replies(1): >>43185280 #
1. anqurvanillapy ◴[] No.43185280[source]
> On the other hand I think we are in a local maxima with programming languages and type systems.

I think I gotcha. Oh no.

> But I haven't finished fleshing it out and proving the type system, so I really can't share it yet.

Any profile I could follow to wait for this to happen some day? I've made several pen-and-paper attempts on some problems you just mentioned, and further ones would be "strict aliasing", "pointer provenance" and many which appeared in all the dicussions here. It feels like you've already done many stuff, but I can't find your profile anywhere.