←back to thread

The provenance memory model for C

(gustedt.wordpress.com)
224 points HexDecOctBin | 6 comments | | HN request time: 0s | source | bottom
Show context
jvanderbot ◴[] No.44422693[source]
I love Rust, but I miss C. If C can be updated to make it generally socially acceptable for new projects, I'd happily go back for some decent subset of things I do. However, there's a lot of anxiety and even angst around using C in production code.
replies(6): >>44422779 #>>44423128 #>>44423371 #>>44423771 #>>44425323 #>>44433479 #
bnferguson ◴[] No.44423371[source]
Feels like Zig is starting to fill that role in some ways. Fewer sharp edges and a bit more safety than C, more modern approach, and even interops really well with C (even being possible to mix the two). Know a couple Rust devs that have said it seems to scratch that C itch while being more modern.

Of course it's still really nice to just have C itself being updated into something that's nicer to work with and easier to write safely, but Zig seems to be a decent other option.

replies(3): >>44423806 #>>44424327 #>>44425774 #
purplesyringa ◴[] No.44425774[source]
How close are Zig's safety guarantees to Rust's? Honest question; I don't follow Zig development. I can't take C seriously because it hasn't even bothered to define provenance until now, but as far as I'm aware, Zig doesn't even try to touch these topics.

Does Zig document the precise mechanics of noalias? Does it provide a mechanism for controllably exposing or not exposing provenance of a pointer? Does it specify the provenance ABA problem in atomics on compare-exchange somehow or is that undefined? Are there any plans to make allocation optimizations sound? (This is still a problem even in Rust land; you can write a program that is guaranteed to exhibit OOM according to the language spec, but LLVM outputs code that doesn't OOM.) Does it at least have a sanitizer like Miri to make sure UB (e.g. data races, type confusion, or aliasing problems) is absent?

If the answer to most of the above is "Zig doesn't care", why do people even consider it better than C?

replies(1): >>44426194 #
dnautics ◴[] No.44426194[source]
safety-wise, zig is better than C because if you don't do "easily flaggable things"[0] it doesn't have buffer overruns (including protection in the case of sentinel strings), or null pointer exceptions. Where this lies on the spectrum of "C to Rust" is a matter of judgement, but if I'm not mistaken it is easily a majority of memory-safety related CVEs. There's also no UB in debug, test, or release-safe. Note: you can opt-out of release-safe on a function-by-function basis. IIUC noalias is safety checked in debug, test, and release-safe.

In a sibling comment, I mentioned a proof of concept I did that if I had the time to complete/do correctly, it should give you near-rust-level checking on memory safety, plus automatically flags sites where you need to inspect the code. At the point where you are using MIRI, you're already bringing extra stuff into rust, so in practice zig + zig-clr could be the equivalent of the result of "what if you moved borrow checking from rustc into miri"

[0] type erasure, or using "known dangerous types, like c pointers, or non-slice multipointers".

replies(1): >>44427336 #
tialaramex ◴[] No.44427336[source]
This is very much a "Draw the rest of the fucking owl" approach to safety.
replies(1): >>44427912 #
dnautics ◴[] No.44427912[source]
what percentage of CVEs are null pointer problems or buffer overflows? That's what percentage of the owl has been drawn. If someone (or me) builds out a proper zig-clr, then we get to, what? 90%. Great. Probably good enough, that's not far off from where rust is.
replies(1): >>44428393 #
comex ◴[] No.44428393[source]
Probably >50% of exploits these days target use-after-frees, not buffer overflows. I don’t have hard data though.

As for null pointer problems, while they may result in CVEs, they’re a pretty minor security concern since they generally only result in denial of service.

Edit 2: Here's some data: In an analysis by Google, the "most frequently exploited" vulnerability types for zero-day exploitation were use-after-free, command injection, and XSS [3]. Since command injection and XSS are not memory-unsafety vulnerabilities, that implies that use-after-frees are significantly more frequently exploited than other types of memory unsafety.

Edit: Zig previously had a GeneralPurposeAllocator that prevented use-after-frees of heap allocations by never reusing addresses. But apparently, four months ago [1], GeneralPurposeAllocator was renamed to DebugAllocator and a comment was added saying that the safety features "require the allocator to be quite slow and wasteful". No explicit reasoning was given for this change, but it seems to me like a concession that applications need high performance generally shouldn't be using this type of allocator. In addition, it appears that use-after-free is not caught for stack allocations [2], or allocations from some other types of allocators.

Note that almost the entire purpose of Rust's borrow checker is to prevent use-after-free. And the rest of its purpose is to prevent other issues that Zig also doesn't protect against: tagged-union type confusion and data races.

[1] https://github.com/ziglang/zig/commit/cd99ab32294a3c22f09615...

[2] https://github.com/ziglang/zig/issues/3180.

[3] https://cloud.google.com/blog/topics/threat-intelligence/202...

replies(1): >>44429957 #
1. dnautics ◴[] No.44429957[source]
yeah I don't think the GPA is really a great strategy for detecting UAF, but it was a good try. It basically creates a new virtual page for each allocation, so the kernel gets involved and ?I think? there is more indirection for any given pointer access. So you can imagine why it wasn't great.

Anyways, I am optimistic that UAF can be prevented by static analysis:

https://www.youtube.com/watch?v=ZY_Z-aGbYm8

Note since this sort of technique interfaces with the compiler, unless the dependency is in a .so file, it will detect UAF in dependencies too, whether or not the dependency chooses to run the static analysis as part of their software quality control.

replies(1): >>44436348 #
2. comex ◴[] No.44436348[source]
Fair enough. In some sense you’re writing your own borrow checker. But (you may know this already) be warned: this has been tried many times for C++, with different levels of annotation burden imposed on programmers.

On one side are the many C++ “static analyzers” like Coverity or clang-analyzer, which work with unannotated C++ code. On the other side is the “Safe C++” proposal (safecpp.org), which is supposed to achieve full safety, but at the cost of basically transplanting Rust’s type system into C++, requiring all functions to have lifetime annotations and disallow mutable aliasing, and replacing the entire standard library with a new one that follows those rules. Between those two extremes there have been tools like the C++ Core Guidelines Checker and Clang’s lifetimebound attribute, which require some level of annotations, and in turn provide some level of checking.

So far, none of these have been particularly successful in preventing memory safety vulnerabilities. Static analyzers are widely used in industry but only find a fraction of bugs. Safe C++ will probably be too unpopular to make it into the spec. The intermediate solutions have some fundamental issues (see [1], though it’s written by the author of Safe C++ and may be biased), and in practice haven’t really taken off.

But I admit that only the “static analyzer” side of the solution space has been extensively explored. The other projects are just experiments whose lack of adoption may be due to inertia as much as inherent lack of merit.

And Zig may be different… I’m not a Zig programmer, but I have the impression that compared to C++ it encourages fewer allocations and smaller codebases, both of which may make lifetime analysis more tractable. It’s also a much younger language whose audience is necessarily much more open to change.

So we’ll see. Good luck - I’d sure like to see more low-level languages offering memory safety.

[1] https://www.circle-lang.org/draft-profiles.html

replies(3): >>44436441 #>>44436589 #>>44465759 #
3. steveklabnik ◴[] No.44436441[source]
> Safe C++ will probably be too unpopular to make it into the spec.

Not just that, but the committee accepted a paper that basically says it's design is against C++'s design principles, so it's effectively dead forever.

replies(1): >>44437031 #
4. tialaramex ◴[] No.44436589[source]
One of the key things in Sean's "Safe C++" is that, like Rust, it actually technically works. If we write software in the safe C++ dialect we get safe programs just as if we write ordinary safe (rather than ever invoking "unsafe") Rust we get safe programs. WG21 didn't take Safe C++ and it will most likely now be a minor footnote in history, but it did really work.

"I think this could be possible" isn't an enabling technology. If you write hard SF it's maybe useful to distinguish things which could happen from those which can't, but for practical purposes it only matters if you actually did it. Sean's proposed "Safe C++" did it, Zig, today, did not.

There are other obstacles - like adoption, as we saw for "Safe C++" - but they're predicated on having the technology at all, you cannot adopt technologies which don't exist, that's just make believe. Which I think is already the path WG21 has set out on.

5. tialaramex ◴[] No.44437031{3}[source]
This was adopted as standing document SD-10 https://isocpp.org/std/standing-documents/sd-10-language-evo...

Here's somebody who was in the room explaining how this was agreed as standing policy for the C++ programming language.

"It was literally the last paper. Seen at the last hour. Of a really long week. Most everyone was elsewhere in other working group meetings assuming no meaningful work was going to happen."

6. dnautics ◴[] No.44465759[source]
> Good luck

Thanks! I think this could be implemented as a (3rd party?) compiler backend.

And yeah, if it gets done quickly enough (before 1.0?) it could get enough momentum that it gets accepted as "considered to be best practice".

Honestly, though, I think the big hurdle for C/C++ static analysis is that lots of dependencies get shipped around as .so's and once that happens it's sort of a black hole unless 1) the dependency's provider agrees to run the analysis or 2) you can easily shim to annotate what's going on in the library's headers. 2) is a pain in the ass, and begging for 1) can piss off the dependency's owner.