←back to thread

Zlib-rs is faster than C

(trifectatech.org)
341 points dochtman | 3 comments | | HN request time: 0s | source
Show context
brianpane ◴[] No.43386768[source]
I contributed a number of performance patches to this release of zlib-rs. This was my first time doing perf work on a Rust project, so here are some things I learned: Even in a project that uses `unsafe` for SIMD and internal buffers, Rust still provided guardrails that made it easier to iterate on optimizations. Abstraction boundaries helped here: a common idiom in the codebase is to cast a raw buffer to a Rust slice for processing, to enable more compile-time checking of lifetimes and array bounds. The compiler pleasantly surprised me by doing optimizations I thought I’d have to do myself, such as optimizing away bounds checks for array accesses that could be proven correct at compile time. It also inlined functions aggressively, which enabled it to do common subexpression elimination across functions. Many times, I had an idea for a micro-optimization, but when I looked at the generated assembly I found the compiler had already done it. Some of the performance improvements came from better cache locality. I had to use C-style structure declarations in one place to force fields that were commonly used together to inhabit the same cache line. For the rare cases where this is needed, it was helpful that Rust enabled it. SIMD code is arch-specific and requires unsafe APIs. Hopefully this will get better in the future. Memory-safety in the language was a piece of the project’s overall solution for shipping correct code. Test coverage and auditing were two other critical pieces.
replies(1): >>43387098 #
1. Boereck ◴[] No.43387098[source]
Interesting! I wonder if you have used PGO in the project? Forcing fields to be located next to each other kind of feels like something that PGO could do for you.
replies(1): >>43387865 #
2. brianpane ◴[] No.43387865[source]
I basically did manual PGO because I was also reducing the size of several integer fields at the same time to pack more into each cache line. I’m excited to try out the rustc+LLVM PGO for future optimizations.
replies(1): >>43389914 #
3. ofek ◴[] No.43389914[source]
A long-standing issue with that was just recently fixed: https://github.com/rust-lang/rust/pull/133250