←back to thread

55 points anqurvanillapy | 1 comments | | HN request time: 0.272s | source

Hi I'm Anqur, a senior software engineer with different backgrounds where development in C was often an important part of my work. E.g.

1) Game: A Chinese/Vietnam game with C/C++ for making server/client, Lua for scripting [1]. 2) Embedded systems: Switch/router with network stack all written in C [2]. 3) (Networked) file system: Ceph FS client, which is a kernel module. [3]

(I left some unnecessary details in links, but are true projects I used to work on.)

Recently, there's a hot topic about Rust and C in kernel and a message [4] just draws my attention, where it talks about the "Rust" experiment in kernel development:

> I'd like to understand what the goal of this Rust "experiment" is: If we want to fix existing issues with memory safety we need to do that for existing code and find ways to retrofit it.

So for many years, I keep thinking about having a new C dialect for retrofitting the problems, but of C itself.

Sometimes big systems and software (e.g. OS, browsers, databases) could be made entirely in different languages like C++, Rust, D, Zig, etc. But typically, like I slightly mentioned above, making a good filesystem client requires one to write kernel modules (i.e. to provide a VFS implementation. I do know FUSE, but I believe it's better if one could use VFS directly), it's not always feasible to switch languages.

And I still love C, for its unique "bare-bone" experience:

1) Just talk to the platform, almost all the platforms speak C. Nothing like Rust's PAL (platform-agnostic layer) is needed. 2) Just talk to other languages, C is the lingua franca (except Go needs no libc by default). Not to mention if I want WebAssembly to talk to Rust, `extern "C"` is need in Rust code. 3) Just a libc, widely available, write my own data structures carefully. Since usually one is writing some critical components of a bigger system in C, it's just okay there are not many choices of existing libraries to use. 4) I don't need an over-generalized generics functionality, use of generics is quite limited.

So unlike a few `unsafe` in a safe Rust, I want something like a few "safe" in an ambient "unsafe" C dialect. But I'm not saying "unsafe" is good or bad, I'm saying that "don't talk about unsafe vs safe", it's C itself, you wouldn't say anything is "safe" or "unsafe" in C.

Actually I'm also an expert on implementing advanced type systems, some of my works include:

1) A row-polymorphic JavaScript dialect [5]. 2) A tiny theorem prover with Lean 4 syntax in less than 1K LOC [6]. 3) A Rust dialect with reuse analysis [7].

Language features like generics, compile-time eval, trait/typeclass, bidirectional typechecking are trivial for me, I successfully implemented them above.

For the retrofitted C, these features initially come to my mind:

1) Code generation directly to C, no LLVM IR, no machine code. 2) Module, like C++20 module, to eliminate use of headers. 3) Compile-time eval, type-level computation, like `malloc(int)` is actually a thing. 4) Tactics-like metaprogramming to generate definitions, acting like type-safe macros. 5) Quantitative types [8] to track the use of resources (pointers, FDs). The typechecker tells the user how to insert `free` in all possible positions, don't do anything like RAII. 6) Limited lifetime checking, but some people tells me lifetime is not needed in such a language.

Any further insights? Shall I kickstart such project? Please I need your ideas very much.

[1]: https://vi.wikipedia.org/wiki/V%C3%B5_L%C3%A2m_Truy%E1%BB%81...

[2]: https://e.huawei.com/en/products/optical-access/ma5800

[3]: https://docs.ceph.com/en/reef/cephfs/

[4]: https://lore.kernel.org/rust-for-linux/Z7SwcnUzjZYfuJ4-@infr...

[5]: https://github.com/rowscript/rowscript

[6]: https://github.com/anqurvanillapy/TinyLean

[7]: https://github.com/SchrodingerZhu/reussir-lang

[8]: https://bentnib.org/quantitative-type-theory.html

Show context
pkkm ◴[] No.43176100[source]
I'm a lot less experienced than you, but since you're collecting ideas, I'll give my opinion.

For me personally, the biggest improvements that could be made to C aren't about advanced type system stuff. They're things that are technically simple but backwards compatibility makes them difficult in practice. In order of importance:

1) Get rid of null-terminated strings; introduce native slice and buffer types. A slice would be basically struct { T *ptr, size_t count } and a buffer would be struct { T *ptr, size_t count, size_t capacity }, though with dedicated syntax to make them ergonomic - perhaps T ^slice and T @buffer. We'd also want buffer -> slice -> pointer decay, beginof/endof/countof/capacityof operators, and of course good handling of type qualifiers.

2) Get rid of errno in favor of consistent out-of-band error handling that would be used in the standard library and recommended for user code too. That would probably involve using the return value for a status code and writing the actual result via a pointer: int do_stuff(T *result, ...).

3) Get rid of the strict aliasing rule.

4) Get rid of various tiny sources of UB. For example, standardize realloc to be equivalent to free when called with a length of 0.

Metaprogramming-wise, my biggest wish would be for a way to enrich programs and libraries with custom compile-time checks, written in plain procedural code rather than some convoluted meta-language. These checks would be very useful for libraries that accept custom (non-printf) format strings, for example. An opt-in linear type system would be nice too.

Tool-wise, I wish there was something that could tell me definitively whether a particular run of my program executed any UB or not. The simpler types of UB, like null pointer dereferences and integer overflows, can be detected now, but I'd also like to know about any violations of aliasing and pointer provenance rules.

replies(2): >>43181957 #>>43184665 #
1. Gibbon1 ◴[] No.43181957[source]
I highly agree that they need to add slice and buffer types to the standard headers. Especially true because the recently added counted_by attribute.

Definitely feel the strict aliasing rule should be opt in.

And there are a lot of small UB that can be eliminated here and there.

I'll add adding types as first class objects. Make typeof actually useful