Interview with Zig language creator Andrew Kelley [video]

Show context

logicchains ◴[27 Aug 20 12:40 UTC] No.24293046[source]▶

I work in HFT, and one of the key concerns when writing low-latency code is "is this code allocating memory, and if so, how can I stop it?" Zig is the perfect language for this use case as none of the standard library implicitly allocates, rather for anything that allocates, the caller must pass in an allocator. The stdlib also provides a handy arena allocator, which is often the best choice.

This is a huge advantage over C++ and Rust, because it makes it much harder for e.g. the intern to write code that repeatedly creates a vector or dynamically allocated string in a loop. Or to use something like std::unordered_map or std::deque that allocates wantonly.

replies(8): >>24293328 #>>24293382 #>>24293469 #>>24293919 #>>24293952 #>>24294403 #>>24294507 #>>24298257 #

AsyncAwait ◴[27 Aug 20 13:13 UTC] No.24293328[source]▶

>>24293046 #

> This is a huge advantage over C++ and Rust, because it makes it much harder for e.g. the intern to write code that repeatedly creates a vector or dynamically allocated string in a loop. Or to use something like std::unordered_map or std::deque that allocates wantonly.

True. On the other hand, Zig makes a deliberate decision not to bother itself with memory safety too much, so its a win some, loose some sort of situation.

replies(1): >>24293466 #

pron ◴[27 Aug 20 13:29 UTC] No.24293466[source]▶

>>24293328 #

> On the other hand, Zig makes a deliberate decision not to bother itself with memory safety too much

This is not true. Zig places a strong emphasis on memory safety, it just does so in a way that's very different from either Java's or Rust's. I wrote more about this here: https://news.ycombinator.com/item?id=24293329

replies(2): >>24293966 #>>24295336 #

littlestymaar ◴[27 Aug 20 14:21 UTC] No.24293966[source]▶

>>24293466 #

> It does so with runtime checks that are turned on in development and testing and turned off -- either globally or per code unit -- in production.

This isn't “memory safety”, with this reasoning you could say “C is memory safe if you use ASAN during debug”: it is exactly equivalent except Zig checks are less powerful than the full suite of existing sanitizers for C, but it's enabled by default in debug mode, which is nice.

replies(1): >>24294039 #

pron ◴[27 Aug 20 14:30 UTC] No.24294039[source]▶

>>24293966 #

No, this is full memory safety enforced through runtime checks. ASAN does not give you that. Zig has only arrays and slices with known sizes and no pointer arithmetic (unless you explicitly appeal to unsafe operations).

replies(2): >>24294204 #>>24294285 #

rfoo ◴[27 Aug 20 14:56 UTC] No.24294285[source]▶

>>24294039 #

I thought after Intel MPX we can all agree "memory safety" in modern languages is more about temporal stuff (i.e. use-after-free, etc) than bounds check, but maybe I'm wrong.

How does those runtime checks kill UAF?

replies(1): >>24294637 #

pron ◴[27 Aug 20 15:28 UTC] No.24294637[source]▶

>>24294285 #

> https://github.com/ziglang/zig/pull/5998

TBD :)

But here's one way that's currently being tried: https://github.com/ziglang/zig/pull/5998

replies(1): >>24298056 #

littlestymaar ◴[27 Aug 20 20:07 UTC] No.24298056[source]▶

>>24294637 #

(Small copy past error here, you posted the same link twice.)

Regarding https://github.com/ziglang/zig/pull/5998 here we're exactly in the realm of C, changing the allocator with a custom one with additional bookkeeping to check for memory management issues. But it tanks performances so you can't generally use it for production (and if you were in a situation were you'd do it anyway, you'd be better off with completely automatic memory management: AKA a GC).

replies(1): >>24298625 #

pron ◴[27 Aug 20 21:01 UTC] No.24298625[source]▶

>>24298056 #

No, Zig is not in the realm of C. Zig gives you full memory safety that you can then selectively turn off. Why is it useful? For the same reason tests are useful even if they don't give you sound guarantees, and are still the primary way of achieving correctness, even in Haskell or Rust. C does not and cannot do this the same way as Zig does, because C cannot be made safe (well, it can, but that's a whole other can of worms) while Zig can. So you make Zig safe, test it, and then remove the guardrails from the performance-critical bits after you're satisfied with their correctness.

Does it provide safety in the same manner Rust does? Absolutely not. Does it provide less correctness overall? Maybe, and maybe it provides more correctness, and maybe the same. It's hard to say without an empirical study. The problem is that sound guarantees often come at a cost -- for example, to language complexity and compilation speed -- that can have a negative effect on correctness.

replies(2): >>24298722 #>>24299513 #

littlestymaar ◴[27 Aug 20 21:13 UTC] No.24298722[source]▶

>>24298625 #

> C does not and cannot do this the same way as Zig does

Regarding the PR you just sent, I'd like to hear why you think it cannot be applied to C?

> So you make Zig safe, test it, and then remove the guardrails from the performance-critical bits after you're satisfied with their correctness.

This isn't safety… This is “we didn't find any memory issue while fuzzing the software” and you'd get the same guarantee: if your fuzzer didn't cause the memory issue, then it remains in your code in production, waiting to explode one day with some hard to debug Heisenbug that only occurs once in a million…

replies(1): >>24298913 #

pron ◴[27 Aug 20 21:29 UTC] No.24298913[source]▶

>>24298722 #

> Regarding the PR you just sent, I'd like to hear why you think it cannot be applied to C?

I didn't mean that a safe allocator cannot be used in C; I meant that C cannot be made memory safe in its entirety as simply as Zig can. Why? Because C has pointer arithmetic while (safe) Zig doesn't, Zig has slices while C doesn't, and C has non-typesafe casts while safe Zig doesn't.

> This isn't safety… This is “we didn't find any memory issue while fuzzing the software” and you'd get the same guarantee:

No, it's not the same guarantee. Fuzzing a C program will not find all the undefined behaviour that fuzzing a Zig program can, for the reasons I mentioned.

It is true that if you use unsafe Zig, i.e. turn off safety for a whole program or some sections of it, you lose the guarantees that safe Zig gives you, and unsafe Zig is indeed not safe (neither is unsafe Rust). But because of the way it's designed, Zig has a way of improving correctness even when safety is removed. This is a tradeoff for sure, but so are sound guarantees, that can have other negative effects on correctness.

replies(1): >>24300895 #

littlestymaar ◴[28 Aug 20 02:44 UTC] No.24300895[source]▶

>>24298913 #

> It is true that if you use unsafe Zig, i.e. turn off safety for a whole program or some sections of it, you lose the guarantees that safe Zig gives you, and unsafe Zig is indeed not safe

Their is no such things as unsafe and safe Zig. All Zig is unsafe, but you can add additional runtime checks (disabled by default in optimized builds) that will slow down your program when used. Using a specific allocator to detect UAF is something you may do in development, but almost surely never in production. And without it your code isn't memory-safe.

> Fuzzing a C program will not find all the undefined behaviour that fuzzing a Zig program can, for the reasons I mentioned.

Zig will have less UB than C, but there will still be lurking UB in your programs no matter how long you test it. Consider the following snippet (on mobile, so this may have stupid syntax errors):

  test "this is UB, but the test won't show it" {
    const allocator = std.heap.page_allocator;
    var buf = try allocator.alloc(u8, 10);
    ohNo(allocator, 42, buf);
    allocator.free(bar);
  }

  fn ohNo(allocator: *Allocator, foo: const u8, bar: *u8) void {
    if (foo == 1337) {
        // double free awaiting to happen in production
        allocator.free(bar);
    }
  }

If you never explicitly test the value “1337” during you debug session, you won't trigger the UB and you won't know it's here, then when you ship your optimized build in production, you'll ship a program with UB in it.

replies(1): >>24302852 #

pron ◴[28 Aug 20 09:06 UTC] No.24302852[source]▶

>>24300895 #

> Their is no such things as unsafe and safe Zig. All Zig is unsafe, but you can add additional runtime checks

Zig is meant to ultimately give you full memory safety, that you can selectively turn off. In addition, there are specific unsafe operations -- clearly marked -- such as casting an integer to a pointer or other non-typesafe casts.

A code unit with safety checks on and without unsafe operations is what I call "safe Zig."

> And without it your code isn't memory-safe.

This is simply not true. Perhaps you mean that you don't have a guarantee that your code is memory-safe, but that's not the same thing.

Our goal is not to write in a language with certain guarantees but to write programs with certain properties, say, without buffer overflows. One way of achieving such a program is to write it in a language that guarantees no such error can happen. Another is to write it in a language that guarantees no such error can happen in development, do some testing, and then remove the guarantees. In the second case it is true that our confidence in the lack of such errors is lower than the first, but in each case it is not 100%, and because the static guarantees are costly, it is possible that the second approach is even more effective at getting to more correct programs overall. They're both common ways for achieving the same goal.

As someone who works with formal methods, we do these tradeoffs in formal verification all the time. It is simple false that sound guarantees are always the best way to correctness -- it would be if they were free, but they're not.

Once you realise that the goal is achieving some desired level of confidence (which is never 100%, as that cannot exist in a physical system anyway) about overall program correctness -- which includes both "safety" and functional properties, each further divided into degrees of severity -- you see that there is no obvious way with the best effectiveness at achieving that goal.

> If you never explicitly test the value “1337” during you debug session, you won't trigger the UB and you won't know it's here

But here, again, you are looking at something in isolation. Because Zig is a simple language, the chances of such paths existing without you noticing are lower; also, because the language is simpler it is easier to write concolic testers that would automatically detect this.

In fact, if such a "rare path" exists in a complex language that causes some functional bug -- ultimately, we don't care what bug breaks our program or leaves it open to security vulnerabilities -- there's a smaller chance that it will be discovered. Which is exactly what I mean by soundness coming at a cost. It guarantees the lack of certain bugs, but because it complicates the language, it can make other bugs more costly to detect.

replies(1): >>24303545 #

littlestymaar ◴[28 Aug 20 11:15 UTC] No.24303545[source]▶

>>24302852 #

> This is simply not true. Perhaps you mean that you don't have a guarantee that your code is memory-safe, but that's not the same thing.

“But people can write correct C code”. Correct Zig != memory safety. It's the opposite: MEMORY SAFETY IS THE GUARANTEE that your code won't have memory error no matter how broken it is!

> Another is to write it in a language that guarantees no such error can happen in development, do some testing, and then remove the guarantees.

That's what the same kind of design as C is with ASan, TSan, MemSan etc. Yes Zig is less broken than C, leading to fewer sources of memory issues, but for what matters most (Double Free, Use After Free[1], Data Races) Zig and C offers the same level of safety guarantees: none.

> As someone who works with formal methods, we do these tradeoffs in formal verification all the time. It is simple false that sound guarantees are always the best way to correctness -- it would be if they were free, but they're not.

This is a straw man: comparing compile-time enforced ownership (Rust borrowck) to formal method doesn't make any more sense than comparing static typing to formal methods. It adds a lot of learning friction, but that's it. I just grepped my current 90kLoc rust project. You know how many lifetime annotation ('x) there is in it? Fifty-four! Which is one every 1666 lines. Please tell me again how much it cripples productivity and the ability to write correct code!

> Because Zig is a simple language, the chances of such paths existing without you noticing are lower;

If you ever try to use shared-memory parallelism, this kind of bugs will be everywhere! That's simple: every call to allocator.free is a minefield.

> ultimately, we don't care what bug breaks our program or leaves it open to security vulnerabilities

Memory safety issues aren't just security vulnerabilities, more than anything they are horrible bugs to track down, and it costs tons of money.

> Which is exactly what I mean by soundness coming at a cost. It guarantees the lack of certain bugs, but because it complicates the language, it can make other bugs more costly to detect.

This is BS. It's not because a language has few symbols or a simple syntax that it is easier to debug. Otherwise brainfuck would be the ultimate productivity tool. Semantic is what matters, and because it has UBs, Zig semantic is more complex than most languages out there. That's why C is one of the most complex language ever in practice, even if it's really “simple” and easy to “learn”.

Again, don't get me wrong, I have nothing against Zig and I find it refreshing because it has tons of cool ergonomic tricks (and having a built-in sanitizer which “just works” out of the box in debug mode without any other programmer intervention is cool!). It's a nice programming language experiment that will probably inspire a lot of others, and it's probably a really cool language for C programmers who like to manage their memory by themselves and don't want the “nany compiler” Rust has and still have a language with a modern look and feel: that's totally legit.

But memory safe, it isn't.

[1]: which cause more than 30% of Google and Microsoft security issues by itself! (https://www.zdnet.com/article/microsoft-70-percent-of-all-se... https://www.chromium.org/Home/chromium-security/memory-safet...)

replies(2): >>24304726 #>>24305249 #