Most active commenters

Const-me(4)

Popular/hot comments

>>43981264 #

←back to thread

Flattening Rust’s learning curve

(corrode.dev)

Show context

sesm ◴[14 May 25 01:11 UTC] No.43979679[source]▶

>>43978435 (OP) #

Is there a concise document that explains major decisions behind Rust language design for those who know C++? Not a newbie tutorial, just straight to the point: why in-place mutability instead of other options, why encourage stack allocation, what problems with C++ does it solve and at what cost, etc.

replies(5): >>43979717 #>>43979806 #>>43980063 #>>43982558 #>>43984758 #

1. jandrewrogers ◴[14 May 25 02:21 UTC] No.43980063[source]▶

>>43979679 #

Rust has better defaults for types than C++, largely because the C++ defaults came from C. Rust is more ergonomic in this regard. If you designed C++ today, it would likely adopt many of these defaults.

However, for high-performance systems software specifically, objects often have intrinsically ambiguous ownership and lifetimes that are only resolvable at runtime. Rust has a pretty rigid view of such things. In these cases C++ is much more ergonomic because objects with these properties are essentially outside the Rust model.

In my own mental model, Rust is what Java maybe should have been. It makes too many compromises for low-level systems code such that it has poor ergonomics for that use case.

replies(3): >>43980292 #>>43980421 #>>43981221 #

2. Ar-Curunir ◴[14 May 25 03:00 UTC] No.43980292[source]▶

>>43980063 (TP) #

> However, for high-performance systems software specifically, objects often have intrinsically ambiguous ownership

What is the evidence for this? Plenty of high-performance systems software (browsers, kernels, web servers, you name it) has been written in Rust. Also Rust does support runtime borrow-checking with Rc<RefCell<_>>. It's just less ergonomic than references, but it works just fine.

replies(1): >>43980604 #

3. Const-me ◴[14 May 25 03:22 UTC] No.43980421[source]▶

>>43980063 (TP) #

Interestingly, CPU-bound high-performance systems are also incompatible with Rust’s model. Ownership for them is unambiguous, but Rust has another issue, doesn’t support multiple writeable references of the same memory accessed by multiple CPU cores in parallel.

A trivial example is multiplication of large square matrices. An implementation needs to leverage all available CPU cores, and a traditional way to do that you’ll find in many BLAS libraries – compute different tiles of the output matrix on different CPU cores. A tile is not a continuous slice of memory, it’s a rectangular segment of a dense 2D array. Storing different tiles of the same matrix in parallel is trivial in C++, very hard in Rust.

replies(2): >>43981043 #>>43982067 #

4. jandrewrogers ◴[14 May 25 03:50 UTC] No.43980604[source]▶

>>43980292 #

Anyone that works on e.g. database kernels that do direct DMA (i.e. all the high-performance ones) experiences this. The silicon doesn’t care about your programming language’s ownership model and will violate it at will. You can’t fix it in the language, you have to accept the behavior of the silicon. Lifetimes are intrinsically ambiguous because objects have neither a consistent nor persistent memory address, a pretty standard property in databases, and a mandatory property of large databases. Yes, you can kind of work around it in idiomatic Rust but performance will not be anything like comparable if you do. You have to embrace the nature of the thing.

The near impossibility of building a competitive high-performance I/O scheduler in safe Rust is almost a trope at this point in serious performance-engineering circles.

To be clear, C++ is not exactly comfortable with this either but it acknowledges that these cases exist and provides tools to manage it. Rust, not so much.

replies(2): >>43981194 #>>43986775 #

5. winrid ◴[14 May 25 05:11 UTC] No.43981043[source]▶

>>43980421 #

Hard in safe rust. you can just use unsafe in that one area and still benefit in most of your application from safe rust.

replies(1): >>43981264 #

6. lenkite ◴[14 May 25 05:44 UTC] No.43981194{3}[source]▶

>>43980604 #

New DB's like Tigerbeetle are written in Zig. Memory control was one of the prime reasons. Rust's custom allocators for the standard library have been a WIP for a decade now.

7. pjmlp ◴[14 May 25 05:48 UTC] No.43981221[source]▶

>>43980063 (TP) #

Java should have been like Modula-3, Eiffel, Active Oberon, unfortunately it did not and has been catching up to rethink its design while preserving its ABI.

Thankfully C# has mostly catched up with those languages, as the other language I enjoy using.

After that, is the usual human factor on programming languages adoption.

8. Const-me ◴[14 May 25 05:55 UTC] No.43981264{3}[source]▶

>>43981043 #

I don’t use C++ for most of my applications. I only use C++ to build DLLs which implement CPU-bound performance sensitive numeric stuff, and sometimes to consume C++ APIs and third-party libraries.

Most of my applications are written in C#.

C# provides memory safety guarantees very comparable to Rust, other safety guarantees are better (an example is compiler option to convert integer overflows into runtime exceptions), is a higher level language, great and feature-rich standard library, even large projects compile in a few seconds, usable async IO, good quality GUI frameworks… Replacing C# with Rust would not be a benefit.

replies(3): >>43981870 #>>43983490 #>>43986327 #

9. ◴[14 May 25 07:31 UTC] No.43981870{4}[source]▶

>>43981264 #

10. arnsholt ◴[14 May 25 07:59 UTC] No.43982067[source]▶

>>43980421 #

That's the tyranny of Gödel incompleteness (or maybe Rice's theorem, or even both): useful formal systems can be either sound or complete. Rust makes the choice of being sound, with the price of course being that some valid operations not being expressible in the language. C of course works the other way around; all valid programs can be expressed, but there's no (general) way to distinguish invalid programs from valid programs.

For your concrete example of subdividing matrixes, that seems like it should be fairly straightforward in Rust too, if you convert your mutable reference to the data into a pointer, wrap your pointer arithmetic shenanigans in an unsafe block and add a comment at the top saying more or less "this is safe because the different subprograms are always operating on disjoint subsets of the data, and therefore no mutable aliasing can occur"?

11. dwattttt ◴[14 May 25 12:03 UTC] No.43983490{4}[source]▶

>>43981264 #

It does sound like quite a similar model; unsafe Rust in self contained regions, safe in the majority of areas.

FWIW in the case where you're not separating code via a dynamic library boundary, you give the compiler an opportunity to optimise across those unsafe usages, e.g. inlining opportunities for the unsafe code into callers.

replies(1): >>43987535 #

12. winrid ◴[14 May 25 16:19 UTC] No.43986327{4}[source]▶

>>43981264 #

I would definitely rather use C# or java in a GUI app, yes.

13. Ar-Curunir ◴[14 May 25 16:59 UTC] No.43986775{3}[source]▶

>>43980604 #

You can always fall back to unsafe. Again, there are very few workloads that C/C++ can support which Rust cannot.

14. Const-me ◴[14 May 25 18:10 UTC] No.43987535{5}[source]▶

>>43983490 #

> quite a similar model

Yeah, and that model is rather old: https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule In practice, complex software systems have been written in multiple languages for decades. The requirements of performance-critical low-level components and high-level logic are too different and they are in conflict.

> you give the compiler an opportunity to optimise across those unsafe usages

One workaround is better design of the DLL API. Instead of implementing performance-critical outer layers in C#, do so on the C++ side of the interop, possibly injecting C# dependencies via function pointers or an abstract interface.

Another option is to re-implement these smaller functions in C#. Modern .NET runtime is not terribly slow; it even supports SIMD intrinsics. You are unlikely to match the performance of an optimised C++ release build with LTO, but it’s unlikely to fall significantly short.

replies(2): >>43987903 #>>43990873 #

15. neonsunset ◴[14 May 25 18:45 UTC] No.43987903{6}[source]▶

>>43987535 #

> LTO

On some workloads (think calls not possible to inline within a hot loop), I found LTO to be a requirement for C code to match C# performance, not the other way around. We've come a long way!

(if you ask if there are any caveats - yes, JIT is able to win additional perf. points by not being constrained with SSE2/4.2 and by shipping more heavily vectorized primitives OOB which allow doing single-line changes that outpace what the average C library has access to)

replies(1): >>43988370 #

16. Const-me ◴[14 May 25 19:37 UTC] No.43988370{7}[source]▶

>>43987903 #

> on some workloads, I found LTO to be a requirement for C code to match C# performance

Yeah, I observed that too. As far as I remember, that code did many small memory allocations, and .NET GC was faster than malloc.

However, last time I tested (used .NET 6 back then), for code which churches numbers with AVX, my C++ with SIMD intrinsics was faster than C# with SIMD intrinsics. Not by much but noticeable, like 20%. The code generator was just better in C++. I suspect the main reason is .NET JIT compiler doesn’t have time for expensive optimisations.

replies(1): >>43988785 #

17. neonsunset ◴[14 May 25 20:23 UTC] No.43988785{8}[source]▶

>>43988370 #

> The code generator was just better in C++. I suspect the main reason is .NET JIT compiler doesn’t have time for expensive optimisations.

Yeah, there are heavy constraints on how many phases there are and how much work each phase can do. Besides inlining budget, there are many hidden "limits" within the compiler which reduce the risk of throughput loss.

For example - JIT will only be able to track so many assertions about local variables at the same time, and if the method has too many blocks, it may not perfectly track them across the full span of them.

GCC and LLVM are able to leisurely repeat optimization phases where-as RyuJIT avoids it (even if some phases replicate some optimizations happened earlier). This will change once "Opt Repeat" feature gets productized[0], we will most likely see it in NativeAOT first, as you'd expect.

On matching codegen quality produced by GCC for vectorized code - I'm usually able to replicate it by iteratively refactoring the implementation and quickly testing its disasm with Disasmo extension. The main catch with this type of code is that GCC, LLVM and ILC/RyuJIT each have their own quirks around SIMD (e.g. does the compiler mistakenly rematerialize vector constant construction inside the loop body, undoing you hosting its load?). Previously, I thought it was a weakness unique to .NET but then I learned that GCC and LLVM tend to also be vulnerable to that, and even regress across updates as it sometimes happens in SIMD edge cases in .NET. But it is certainly not as common. What GCC/LLVM are better at is if you start abstracting away your SIMD code in which case it may need more help as once you start exhausting available registers due to sometimes less than optimal register allocation you start getting spills or you may be running in a technically correct behavior around vector shuffles where JIT needs to replicate portable behavior but fails to see your constant does not need it so you need to reach out for platform-specific intrinsics to work around it.

[0]: https://github.com/dotnet/runtime/issues/108902

18. dwattttt ◴[15 May 25 01:15 UTC] No.43990873{6}[source]▶

>>43987535 #

> injecting C# dependencies via function pointers or an abstract interface

This is the opposite of what I was suggesting though; those function pointers or abstract interfaces inhibit the kind of optimisations I was suggesting (e.g. inlining causing dead code removal of bounds checks, or inlining comparison functions into sort implementations, classics).

EDIT: that said, it's definitely still possible to not let it impact performance, it just takes being somewhat careful when making the interface, which you don't have to think about if it's all the same compiler/link step

↑