Most active commenters
  • ajross(9)
  • steveklabnik(6)
  • solidsnack9000(3)

←back to thread

Zlib-rs is faster than C

(trifectatech.org)
341 points dochtman | 43 comments | | HN request time: 1.005s | source | bottom
1. IshKebab ◴[] No.43381686[source]
It's barely faster. I would say it's more accurate to say it's as fast as C, which is still a great achievement.
replies(3): >>43381694 #>>43381776 #>>43381791 #
2. throwaway48476 ◴[] No.43381694[source]
But it is faster. The closer to theoretical maximum the smaller the gains become.
replies(1): >>43381768 #
3. mananaysiempre ◴[] No.43381768[source]
Zlib-ng is between a couple and multiple times away from the state of the art[1], it’s just that nobody has yet done the (hard) work of adjusting libdeflate[2] to a richer API than “complete buffer in, complete buffer out”.

[1] https://github.com/zlib-ng/zlib-ng/issues/1486

[2] https://github.com/ebiggers/libdeflate

4. qweqwe14 ◴[] No.43381776[source]
"Barely" or not is completely irrelevant. The fact is that it's measurably faster than the C implementation with the more common parameters. So the point that you're trying to make isn't clear tbh.

Also I'm pretty sure that the C implementation had more man hours put into it than the Rust one.

replies(1): >>43381891 #
5. ajross ◴[] No.43381791[source]
It's... basically written in C. I'm no expert on zlib/deflate or related algorithms, but digging around https://github.com/trifectatechfoundation/zlib-rs/ almost every block with meaningful logic is marked unsafe. There's raw allocation management, raw slicing of arrays, etc... This code looks and smells like C, and very much not like rust. I don't know that this is a direct transcription of the C code, but if you were to try something like that this is sort of what it would look like.

I think there's lots of value in wrapping a raw/unsafe implementation with a rust API, but that's not quite what most people think of when writing code "in rust".

replies(6): >>43381833 #>>43381841 #>>43381849 #>>43382402 #>>43383336 #>>43385335 #
6. hermanradtke ◴[] No.43381833[source]
> basically written in C

Unsafe Rust still has to conform to many of Rust’s rules. It is meaningfully different than C.

replies(2): >>43381882 #>>43382119 #
7. xxs ◴[] No.43381841[source]
I mentioned in under another comment - and while I consider myself versed enough in deflate - comparing the library to zlib-ng is quite weird as the latter is generally hand written assembly. In order to beat it'd take some oddity in the test itself
8. oneshtein ◴[] No.43381849[source]
Cannot understand your complain. It written in Rust, but for you it looks like C. So what?
replies(2): >>43382043 #>>43382101 #
9. est31 ◴[] No.43381882{3}[source]
It has also way less tooling available than C to analyze its safety.
replies(3): >>43382241 #>>43383742 #>>43385568 #
10. bee_rider ◴[] No.43381891[source]
I think that would be really hard to measure. In particular, for this sort of very optimized code, we’d want to separate out the time spent designing the algorithms (which the Rust version benefits from as well). Actually I don’t think that is possible at all (how will we separate out time spent coding experiments in C, then learning from them).

Fortunately these “which language is best” SLOC measuring contests are just frivolous little things that only silly people take seriously.

11. Alifatisk ◴[] No.43382043{3}[source]
So, it is basically like it was written in C.
replies(1): >>43385320 #
12. ajross ◴[] No.43382101{3}[source]
It doesn't exploit (and in fact deliberately evades) Rust's signature memory safety features. The impression from the headline is "Rust is as fast as C now!", but in fact the subset of the language that has been shown to be as fast as C is the subset that is basically isomorphic to C.

The impression a naive reader might take is that idiomatic/safe/best-practices Rust has now closed the performance gap. But clearly that's not happening here.

replies(1): >>43382366 #
13. ajross ◴[] No.43382119{3}[source]
Are there examples you're thinking about? The only good ones I can think of are bits about undefined behavior semantics, which frankly are very well covered in modern C code via tools like ubsan, etc...
replies(2): >>43382359 #>>43382384 #
14. nindalf ◴[] No.43382241{4}[source]
The number of tools matters less than the quality of the tools. Rust’s inherent guarantees + miri + software verification tools mean that in practice Rust code, even with unsafe, ends up being higher quality.
15. sedatk ◴[] No.43382359{4}[source]
This comment summarizes the difference of unsafe Rust quite well. Basically, mostly safe Rust, but with few exceptions, fewer than one would imagine: https://news.ycombinator.com/item?id=43382176
16. sedatk ◴[] No.43382366{4}[source]
Rust's many memory safety features (including the borrow checker) are still enabled in unsafe Rust blocks.

For more information: https://news.ycombinator.com/item?id=43382176

replies(1): >>43383496 #
17. steveklabnik ◴[] No.43382384{4}[source]
They're just fundamentally different languages. There's semantics that exist in all four of these quadrants:

* defined in C, undefined in Rust

* undefined in C, undefined in Rust

* defined in Rust, undefined in C

* defined in Rust, defined in C

replies(1): >>43383481 #
18. johnisgood ◴[] No.43382402[source]
It does actually seem like what a C -> Rust transpiler would spit out.
19. gf000 ◴[] No.43383336[source]
C is not assembly, nor is it portable assembly at all in this century, so your phrasing is very off.

C code will go through a huge amounts of transformations by the compiler, and unless you are a compiler expert you will have no idea how the resulting code looks. It's not targeting the PDP-11 anymore.

20. ajross ◴[] No.43383481{5}[source]
That doesn't seem responsive. The question wasn't whether Rust and C are literally the same language ("duh", as it were), it was effectively "are there meaningful safety features provided to the unsafe zlib-rs code in question in that aren't already available in C toolchains/ecosystems?"

And there really aren't. The abbreviated/limited safety environment being exploited by this non-idiomatic Rust code seems to me to be basically isomorphic to the way you'd solve the problem in C.

replies(1): >>43383677 #
21. ajross ◴[] No.43383496{5}[source]
But again, not exploited by the code in question. This isn't using the Rust runtime heap, it's doing its own thing with raw pointers/indexing, and even seems to have its own allocator.
replies(2): >>43383632 #>>43385001 #
22. steveklabnik ◴[] No.43383632{6}[source]
> This isn't using the Rust runtime heap,

Rust does not have a specific "Rust runtime heap."

replies(1): >>43386173 #
23. steveklabnik ◴[] No.43383677{6}[source]
> it was effectively "are there meaningful safety features provided to the unsafe zlib-rs code in question in that aren't already available in C toolchains/ecosystems?"

Ah, so that was like, not in your comment, but in a parent.

> And there really aren't.

I mean, not all of the code is unsafe. From a cursory glance, there's surely way more here than I see in most Rust packages, but that doesn't mean that you get no advantages. I picked a random file, and chose some random code out of it, and see this:

    pub fn copy<'a>(
        dest: &mut MaybeUninit<DeflateStream<'a>>,
        source: &mut DeflateStream<'a>,
    ) -> ReturnCode {
        // SAFETY: source and dest are both mutable references, so guaranteed not to overlap.
        // dest being a reference to maybe uninitialized memory makes a copy of 1 DeflateStream valid.
        unsafe {
            core::ptr::copy_nonoverlapping(source, dest.as_mut_ptr(), 1);
        }
The semantics of safe code, `&mut T`, provide the justification for why the unsafe code is okay. Heck, this code wouldn't even be legal in C, thanks to strict aliasing. (Well, I guess you could argue that in C code they'd be of the same type, since you don't have "might be uninitialized" in C's typesystem, but again, this is an invariant encoded in the type system that C can't do, so it's not possible to express in C for that reason either.)
replies(1): >>43383764 #
24. wyager ◴[] No.43383742{4}[source]
Miri is better than any C tool I'm aware of for runtime UB detection.
replies(1): >>43384505 #
25. ajross ◴[] No.43383764{7}[source]
Isn't that exactly my point though? This is just a memcpy(). In C, you do some analysis to prove to yourself that the pointers are valid[1]. In this unsafe Rust code, the author did some analysis to prove the same thing. I mean, sure, the specific analyses use words and jargon that are different. I don't think that's particularly notable. This is C code, written in Rust.

[1] FWIW, memcpy() arguments are declared restrict post-C99, the strict aliasing thing doesn't apply, for exactly the reason you're imagining.

replies(1): >>43383856 #
26. steveklabnik ◴[] No.43383856{8}[source]
> In C, you do some analysis to prove to yourself that the pointers are valid[1]

Right, and in Rust, you don't have to do it yourself: the language does it for you. If the signature were in C, you'd have to analyze the callers to make sure that this property is upheld when invoked. In Rust, the compiler does that for you.

> the strict aliasing thing doesn't apply

Yes, this is the case in this specific instance due to it being literally memcpy, but if it were any other function with the same signature, the problem would exist. Again, I picked some code at random, I'm not saying this one specific instance is even the best one. The broader point of "Rust has a type system that lets you encode more invariants than C's" is still broadly true.

replies(1): >>43384409 #
27. ajross ◴[] No.43384409{9}[source]
> In Rust, the compiler does that for you.

No it doesn't? That comment is expressing a human analysis. The compiler would allow you to stuff any pointer in that you want, even ones that overlap. You're right that some side effects of the runtime can be exploited to do that analysis. But that's true of C too! (Like, "these are two separate heap blocks", or "these are owned by two separate objects", etc...). Still human analysis.

Frankly you're overselling hard here. A human author can absolutely mess that analysis up, which is the whole reason Rust calls it "unsafe" to begin with.

replies(1): >>43384514 #
28. est31 ◴[] No.43384505{5}[source]
Miri is the closest to a UB specification for Rust that there is, coming in the form of a tool so you can run it. It's really cool but Valgrind, which is a C tool that also supports Rust, also supports Rust code that calls to C and that does I/O, both pretty common things for programs to do.
29. steveklabnik ◴[] No.43384514{10}[source]
I think you're misunderstanding of what I'm claiming is being checked. I don't mean the unsafe block directly. I mean that &mut Ts do not alias. That is checked by the compiler.

I'm saying that even in a codebase with a lot of unsafe, the checks that are still performed have value.

replies(1): >>43388428 #
30. ComputerGuru ◴[] No.43385001{6}[source]
That is not correct; in another comment you can see where the code takes advantage of the rust-specific &mut notation to use a fast memcpy for non-overlapping pointers.
31. oneshtein ◴[] No.43385320{4}[source]
Yes, it's possible to write in Rust like in C. This code is example of that. You can even use automatic code converter to convert C into Rust.
32. solidsnack9000 ◴[] No.43385335[source]
I'm not sure why people say this about certain languages (it is sometimes said about Haskell, as well).

The code has a C style to it, but that doesn't mean it wasn't actually written in Rust -- Rust deliberately has features to support writing this kind of code, in concert with safer, stricter code.

Imagine if we applied this standard to C code. "Zlib-NG is basically written in assembler, not C..." https://github.com/zlib-ng/zlib-ng/blob/50e9ca06e29867a9014e...

replies(1): >>43387472 #
33. vlovich123 ◴[] No.43385568{4}[source]
The things I’ve seen broadly adopted in the industry (i.e. sanitizers) are equally available in Rust. & Rust’s testing infrastructure is standardized so tests are actually common to see in every library.
34. IshKebab ◴[] No.43386173{7}[source]
It does, it has a default global heap allocator.
replies(2): >>43386624 #>>43389482 #
35. simonask ◴[] No.43386624{8}[source]
That's not a "Rust runtime", that's an extension point. The default setting is `malloc()`.
36. ajross ◴[] No.43387472{3}[source]
> Imagine if we applied this standard to C code. "Zlib-NG is basically written in assembler, not C..."

We absolutely should, if someone claimed/implied-via-headline that naive C was natively as fast as hand-tuned assembly! This kind of context matters.

FWIW: I'm not talking about the assembly in zlib-rs, I was specifically limiting my analysis to the rust layers doing memory organization, etc... Discussing Rust is just exhausting. It's one digression after another, like the community can't just take a reasonable point ("zlib-rs isn't a good example of idiomatic rust performance") on its face.

replies(3): >>43388819 #>>43396152 #>>43396173 #
37. ajross ◴[] No.43388428{11}[source]
Sure, but C++ objects returned from operator new are likewise guaranteed not to alias. There's "value" there, but not a lot of value. And I repeat, you're overselling hard here. People who write rust like this are going to produce roughly the same amount of memory safety bugs, and pretending otherwise is frankly dangerous, IMHO.
replies(1): >>43390634 #
38. solidsnack9000 ◴[] No.43388819{4}[source]
I'm not sure anyone really believes `zlib-rs` is a good example of idiomatic Rust performance, though

Maybe the reason I think that is because I've written Rust for a variety of purposes (web application, database bindings, high performance parser) so I account for the "register" of Rust that is appropriate without thinking about it.

https://en.wikipedia.org/wiki/Register_(sociolinguistics)

It might be that a simple description like the headline leads some people to believe they could write Rust the easy way and get code that's as fast as writing "Rust the hard way".

However, that is different than what you earlier said -- "It's... basically written in C.". I have actually written Rust programs where some parts were literally written in C and linked in -- in order to build functioning plugins -- and there is a world of difference with that.

Regarding

Discussing Rust is just exhausting. It's one digression after another, like the community can't just take a reasonable point ("zlib-rs isn't a good example of idiomatic rust performance") on its face.

I'm just not sure what to say to this. What do you expect from me, here?

39. steveklabnik ◴[] No.43389482{8}[source]
That's not part of Rust, that's a feature of its standard library. This is the same as C, where a freestanding implementation doesn't include malloc.

Put another way, there's no issues with a library using its own heap if it wants to.

40. sophacles ◴[] No.43390634{12}[source]
The difference is:

In c++ i could do something like:

x_ptr = new object y_ptr = x_ptr

copy(x_ptr, y_ptr)

In safe rust there is no way to call the function in question if that sort of aliasing has happened. This means that if you get a bug from your copy, its in the copy method - the possibility it's been used inappropriately has been eliminated.

It reduces the search space for problems from: everywhere that created a pointer that is ultimately used in the copy, to: the copy function itself.

It reduces the number of programmers who have to keep the memory semantics of that copy in their head from "potentially everyone" to just "those who directly implement and check copy".

Pretending that has no value is absurd.

41. ◴[] No.43396152{4}[source]
42. hitekker ◴[] No.43396173{4}[source]
FWIW, I think your opinion is accurate, particularly regarding digressions. It's a common debate tactic for hyping a language under the thin veneer of technical discussion.
replies(1): >>43408267 #
43. solidsnack9000 ◴[] No.43408267{5}[source]
It seems like you're saying this thread is not constructive or not on topic. What could I have responded with that might have been better?