Most active commenters
  • dnautics(5)
  • tialaramex(5)
  • jvanderbot(3)

←back to thread

The provenance memory model for C

(gustedt.wordpress.com)
224 points HexDecOctBin | 38 comments | | HN request time: 1.521s | source | bottom
1. jvanderbot ◴[] No.44422693[source]
I love Rust, but I miss C. If C can be updated to make it generally socially acceptable for new projects, I'd happily go back for some decent subset of things I do. However, there's a lot of anxiety and even angst around using C in production code.
replies(6): >>44422779 #>>44423128 #>>44423371 #>>44423771 #>>44425323 #>>44433479 #
2. mikewarot ◴[] No.44422779[source]
If you can stomach the occasional Begin and End, and a far less confusing pointer syntax, Pascal might be the language for you. Free Pascal has some great string handling, so you never have to worry about allocating and freeing them, and they can store gigabytes of text, even Unicode. ;-)
replies(2): >>44422784 #>>44422867 #
3. tgv ◴[] No.44422784[source]
Or try Ada.
4. jvanderbot ◴[] No.44422867[source]
If my fellow devs cringe at C, imagine their reaction to Pascal
replies(1): >>44423210 #
5. flohofwoe ◴[] No.44423128[source]
> to make it generally socially acceptable for new projects...

Or better yet, don't let 'social pressure' influence your choice of programming language ;)

If your workplace has a clear rule to not use memory-unsafe languages for production code that's a different matter of course. But nothing can stop you from writing C code as a hobby - C99 and later is a very enjoyable and fun language.

replies(3): >>44423284 #>>44424118 #>>44424932 #
6. mikewarot ◴[] No.44423210{3}[source]
C has all the things to hate in a programming language

  CaSe Sensitivity
  Weird pointer syntax
  Lack of a separate assignment token
  Null terminated strings
  Macros - the evil scourge of the universe
On the plus side, it's installed everywhere, and it's not indent sensitive
replies(5): >>44423261 #>>44424125 #>>44424253 #>>44424648 #>>44430684 #
7. jvanderbot ◴[] No.44423261{4}[source]
At this point, you're talking to someone who isn't here
8. xxs ◴[] No.44423284[source]
I was about the reply no amount of pressure can tell me how to program. C was totally fine for esp32
9. bnferguson ◴[] No.44423371[source]
Feels like Zig is starting to fill that role in some ways. Fewer sharp edges and a bit more safety than C, more modern approach, and even interops really well with C (even being possible to mix the two). Know a couple Rust devs that have said it seems to scratch that C itch while being more modern.

Of course it's still really nice to just have C itself being updated into something that's nicer to work with and easier to write safely, but Zig seems to be a decent other option.

replies(3): >>44423806 #>>44424327 #>>44425774 #
10. modeless ◴[] No.44423771[source]
Fil-C is a modified version of Clang that makes C and C++ memory safe. It supports things you wouldn't expect to work like signal handling or setjmp/longjmp. It can compile real C projects like SQLite and OpenSSL with minimal to no changes, today. https://github.com/pizlonator/llvm-project-deluge/blob/delug...
replies(1): >>44427476 #
11. pjmlp ◴[] No.44423806[source]
As usual the remark that much of the Zig's safety over C, has been present since the late 1970's in languages like Modula-2, Object Pascal and Ada, but sadly they didn't born with curly brackets, nor brought a free OS to the uni party.
12. TimorousBestie ◴[] No.44424118[source]
> Or better yet, don't let 'social pressure' influence your choice of programming language ;)

It’s hard. Programming is a social discipline, and the more people who work in a language, the more love it gets.

replies(1): >>44425134 #
13. ioasuncvinvaer ◴[] No.44424125{4}[source]
Except for null terminated strings these don't seem like mayor issues to me. Can you elaborate?
14. 1718627440 ◴[] No.44424253{4}[source]
> Lack of a separate assignment token

What does that mean?

replies(1): >>44424637 #
15. dnautics ◴[] No.44424327[source]
(self-promotion) in principle one should be able to implement a fairly mature pointer provenance checker for zig, without changing the language. A basic proof of concept (don't use this, branches and loops have not been implemented yet):

https://www.youtube.com/watch?v=ZY_Z-aGbYm8

16. kbolino ◴[] No.44424637{5}[source]
Assignment is = which is too close to equality == and thus has been the source of bugs in the past, especially since C treats assignment as an expression and coerces lots of non-boolean values to true/false wherever a condition is expected (if, while, for). Most compilers warn about this at least nowadays.
replies(1): >>44427573 #
17. zelphirkalt ◴[] No.44424648{4}[source]
You mean "mere string replacement macros, instead of hygienic macros", of course : )
18. Y_Y ◴[] No.44424932[source]
I don't want to summon WB, but honest-to-god, D is a good middle ground here.
19. spauldo ◴[] No.44425134{3}[source]
If you're on UNIX or working in the embedded space, C is still everywhere and gets lots of love. C tends to get lots of libraries anyway because everything can FFI to it.
20. uecker ◴[] No.44425323[source]
Do you really love Rust, or do you feel pressured to say so?
replies(2): >>44425848 #>>44428768 #
21. purplesyringa ◴[] No.44425774[source]
How close are Zig's safety guarantees to Rust's? Honest question; I don't follow Zig development. I can't take C seriously because it hasn't even bothered to define provenance until now, but as far as I'm aware, Zig doesn't even try to touch these topics.

Does Zig document the precise mechanics of noalias? Does it provide a mechanism for controllably exposing or not exposing provenance of a pointer? Does it specify the provenance ABA problem in atomics on compare-exchange somehow or is that undefined? Are there any plans to make allocation optimizations sound? (This is still a problem even in Rust land; you can write a program that is guaranteed to exhibit OOM according to the language spec, but LLVM outputs code that doesn't OOM.) Does it at least have a sanitizer like Miri to make sure UB (e.g. data races, type confusion, or aliasing problems) is absent?

If the answer to most of the above is "Zig doesn't care", why do people even consider it better than C?

replies(1): >>44426194 #
22. grg0 ◴[] No.44425848[source]
He grew up in a very stringent household. Everybody was writing Rust and he was like, "damn, I wish I could write C."
23. dnautics ◴[] No.44426194{3}[source]
safety-wise, zig is better than C because if you don't do "easily flaggable things"[0] it doesn't have buffer overruns (including protection in the case of sentinel strings), or null pointer exceptions. Where this lies on the spectrum of "C to Rust" is a matter of judgement, but if I'm not mistaken it is easily a majority of memory-safety related CVEs. There's also no UB in debug, test, or release-safe. Note: you can opt-out of release-safe on a function-by-function basis. IIUC noalias is safety checked in debug, test, and release-safe.

In a sibling comment, I mentioned a proof of concept I did that if I had the time to complete/do correctly, it should give you near-rust-level checking on memory safety, plus automatically flags sites where you need to inspect the code. At the point where you are using MIRI, you're already bringing extra stuff into rust, so in practice zig + zig-clr could be the equivalent of the result of "what if you moved borrow checking from rustc into miri"

[0] type erasure, or using "known dangerous types, like c pointers, or non-slice multipointers".

replies(1): >>44427336 #
24. tialaramex ◴[] No.44427336{4}[source]
This is very much a "Draw the rest of the fucking owl" approach to safety.
replies(1): >>44427912 #
25. tialaramex ◴[] No.44427476[source]
Fil-C does seem like a quicker route if your existing idea was something like "rewrite it in Java" and it exists today whereas both C and C++ have only vague ambitions to deliver some future language which might meet your needs.

I will be very surprised if there's widespread adoption of Fil-C for many new projects though.

replies(1): >>44430422 #
26. tialaramex ◴[] No.44427573{6}[source]
Even with warnings this is just terrible. People need to stop inventing languages where "False" is true, or an empty container is false or other insane "coercions" of this kind.

True is true, and false is false, if you're wondering whether this Doodad is Wibbly, you should ask that question not rely on a convention that Wibbly Doodads are somehow "truthy" while the non-Wibbly ones are not.

27. dnautics ◴[] No.44427912{5}[source]
what percentage of CVEs are null pointer problems or buffer overflows? That's what percentage of the owl has been drawn. If someone (or me) builds out a proper zig-clr, then we get to, what? 90%. Great. Probably good enough, that's not far off from where rust is.
replies(1): >>44428393 #
28. comex ◴[] No.44428393{6}[source]
Probably >50% of exploits these days target use-after-frees, not buffer overflows. I don’t have hard data though.

As for null pointer problems, while they may result in CVEs, they’re a pretty minor security concern since they generally only result in denial of service.

Edit 2: Here's some data: In an analysis by Google, the "most frequently exploited" vulnerability types for zero-day exploitation were use-after-free, command injection, and XSS [3]. Since command injection and XSS are not memory-unsafety vulnerabilities, that implies that use-after-frees are significantly more frequently exploited than other types of memory unsafety.

Edit: Zig previously had a GeneralPurposeAllocator that prevented use-after-frees of heap allocations by never reusing addresses. But apparently, four months ago [1], GeneralPurposeAllocator was renamed to DebugAllocator and a comment was added saying that the safety features "require the allocator to be quite slow and wasteful". No explicit reasoning was given for this change, but it seems to me like a concession that applications need high performance generally shouldn't be using this type of allocator. In addition, it appears that use-after-free is not caught for stack allocations [2], or allocations from some other types of allocators.

Note that almost the entire purpose of Rust's borrow checker is to prevent use-after-free. And the rest of its purpose is to prevent other issues that Zig also doesn't protect against: tagged-union type confusion and data races.

[1] https://github.com/ziglang/zig/commit/cd99ab32294a3c22f09615...

[2] https://github.com/ziglang/zig/issues/3180.

[3] https://cloud.google.com/blog/topics/threat-intelligence/202...

replies(1): >>44429957 #
29. ◴[] No.44428768[source]
30. dnautics ◴[] No.44429957{7}[source]
yeah I don't think the GPA is really a great strategy for detecting UAF, but it was a good try. It basically creates a new virtual page for each allocation, so the kernel gets involved and ?I think? there is more indirection for any given pointer access. So you can imagine why it wasn't great.

Anyways, I am optimistic that UAF can be prevented by static analysis:

https://www.youtube.com/watch?v=ZY_Z-aGbYm8

Note since this sort of technique interfaces with the compiler, unless the dependency is in a .so file, it will detect UAF in dependencies too, whether or not the dependency chooses to run the static analysis as part of their software quality control.

replies(1): >>44436348 #
31. cryptonector ◴[] No.44430422{3}[source]
A big stumbling block is that Fil-C requires all C in the program to be built with Fil-C, including all libraries. That means that Debian and such would need to either adopt Fil-C (perhaps for some distros) or ship Fil-C and non-Fil-C libraries for all pkgs with libraries. The alternative is that you have to build everything yourself, and this gets painful if you need to support ELFs/DLLs.
32. cryptonector ◴[] No.44430684{4}[source]
> C has all the things to hate in a programming language

> CaSe Sensitivity

Wait, what, you.. you want a case-insensitive language? Like SQL?

I love SQL, but please no more case-insensitive programming languages!

33. bmn__ ◴[] No.44433479[source]
https://github.com/tsoding/crust
34. comex ◴[] No.44436348{8}[source]
Fair enough. In some sense you’re writing your own borrow checker. But (you may know this already) be warned: this has been tried many times for C++, with different levels of annotation burden imposed on programmers.

On one side are the many C++ “static analyzers” like Coverity or clang-analyzer, which work with unannotated C++ code. On the other side is the “Safe C++” proposal (safecpp.org), which is supposed to achieve full safety, but at the cost of basically transplanting Rust’s type system into C++, requiring all functions to have lifetime annotations and disallow mutable aliasing, and replacing the entire standard library with a new one that follows those rules. Between those two extremes there have been tools like the C++ Core Guidelines Checker and Clang’s lifetimebound attribute, which require some level of annotations, and in turn provide some level of checking.

So far, none of these have been particularly successful in preventing memory safety vulnerabilities. Static analyzers are widely used in industry but only find a fraction of bugs. Safe C++ will probably be too unpopular to make it into the spec. The intermediate solutions have some fundamental issues (see [1], though it’s written by the author of Safe C++ and may be biased), and in practice haven’t really taken off.

But I admit that only the “static analyzer” side of the solution space has been extensively explored. The other projects are just experiments whose lack of adoption may be due to inertia as much as inherent lack of merit.

And Zig may be different… I’m not a Zig programmer, but I have the impression that compared to C++ it encourages fewer allocations and smaller codebases, both of which may make lifetime analysis more tractable. It’s also a much younger language whose audience is necessarily much more open to change.

So we’ll see. Good luck - I’d sure like to see more low-level languages offering memory safety.

[1] https://www.circle-lang.org/draft-profiles.html

replies(3): >>44436441 #>>44436589 #>>44465759 #
35. steveklabnik ◴[] No.44436441{9}[source]
> Safe C++ will probably be too unpopular to make it into the spec.

Not just that, but the committee accepted a paper that basically says it's design is against C++'s design principles, so it's effectively dead forever.

replies(1): >>44437031 #
36. tialaramex ◴[] No.44436589{9}[source]
One of the key things in Sean's "Safe C++" is that, like Rust, it actually technically works. If we write software in the safe C++ dialect we get safe programs just as if we write ordinary safe (rather than ever invoking "unsafe") Rust we get safe programs. WG21 didn't take Safe C++ and it will most likely now be a minor footnote in history, but it did really work.

"I think this could be possible" isn't an enabling technology. If you write hard SF it's maybe useful to distinguish things which could happen from those which can't, but for practical purposes it only matters if you actually did it. Sean's proposed "Safe C++" did it, Zig, today, did not.

There are other obstacles - like adoption, as we saw for "Safe C++" - but they're predicated on having the technology at all, you cannot adopt technologies which don't exist, that's just make believe. Which I think is already the path WG21 has set out on.

37. tialaramex ◴[] No.44437031{10}[source]
This was adopted as standing document SD-10 https://isocpp.org/std/standing-documents/sd-10-language-evo...

Here's somebody who was in the room explaining how this was agreed as standing policy for the C++ programming language.

"It was literally the last paper. Seen at the last hour. Of a really long week. Most everyone was elsewhere in other working group meetings assuming no meaningful work was going to happen."

38. dnautics ◴[] No.44465759{9}[source]
> Good luck

Thanks! I think this could be implemented as a (3rd party?) compiler backend.

And yeah, if it gets done quickly enough (before 1.0?) it could get enough momentum that it gets accepted as "considered to be best practice".

Honestly, though, I think the big hurdle for C/C++ static analysis is that lots of dependencies get shipped around as .so's and once that happens it's sort of a black hole unless 1) the dependency's provider agrees to run the analysis or 2) you can easily shim to annotate what's going on in the library's headers. 2) is a pain in the ass, and begging for 1) can piss off the dependency's owner.