←back to thread

Zlib-rs is faster than C

(trifectatech.org)
341 points dochtman | 1 comments | | HN request time: 0s | source
Show context
YZF ◴[] No.43381858[source]
I found out I already know Rust:

        unsafe {
            let x_tmp0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x10);
            xmm_crc0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x01);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, x_tmp0);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, xmm_crc0);
Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?

Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?

replies(30): >>43381895 #>>43381907 #>>43381922 #>>43381925 #>>43381928 #>>43381931 #>>43381934 #>>43381952 #>>43381971 #>>43381985 #>>43382004 #>>43382028 #>>43382110 #>>43382166 #>>43382503 #>>43382805 #>>43382836 #>>43383033 #>>43383096 #>>43383480 #>>43384867 #>>43385039 #>>43385521 #>>43385577 #>>43386151 #>>43386256 #>>43386389 #>>43387043 #>>43388529 #>>43392530 #
Filligree ◴[] No.43381907[source]
The usual answer is: You only need to verify the unsafe blocks, not every block. Though 'unsafe' in Rust is actually even less safe than regular C, if a bit more predictable, so there's a crossover point where you really shouldn't have bothered.

The Rust compiler is indeed better than the C one, largely because of having more information and doing full-program optimisation. A `vec_foo = vec_foo.into_iter().map(...).collect::Vec<foo>`, for example, isn't going to do any bounds checks or allocate.

replies(2): >>43381960 #>>43384229 #
johnisgood ◴[] No.43381960[source]
I have been told that "unsafe" affects code outside of that block, but hopefully steveklabnik may explain it better (again).

> isn't going to do any bounds checks or allocate.

You need to add explicit bounds check or explicitly allocate in C though. It is not there if you do not add it yourself.

replies(4): >>43382151 #>>43382226 #>>43382369 #>>43392828 #
1. steveklabnik ◴[] No.43382369[source]
> I have been told that "unsafe" affects code outside of that block, but hopefully stevelabnik may explain it better (again).

It's due to a couple of different things interacting with each other: unsafe relies on invariants that safe code must also uphold, and that the privacy boundary in Rust is the module.

Before we get into the unsafe stuff, I want you to consider an example. Is this Rust code okay?

    struct Foo {
       bar: usize,
    }
    
    impl Foo {
        fn set_bar(&mut self, bar: usize) {
            self.bar = bar;
        }
    }
No unsafe shenanigans here. This code is perfectly safe, if a bit useless.

Let's talk about unsafe. The canonical example of unsafe code being affected outside of unsafe itself is the implementation of Vec<T>. Vecs look something like this (the real code is different for reasons that don't really matter in this context):

    struct Vec<T> {
       ptr: *mut T,
       len: usize,
       cap: usize,
    }
The pointer is to a bunch of Ts in a row, the length is the current number of Ts that are valid, and the capacity is the total number of Ts. The length and the capacity are different so that memory allocation is amortized; the capacity is always greater than or equal to the length.

That property is very important! If the length is greater than the capacity, when we try and index into the Vec, we'd be accessing random memory.

So now, this function, which is the same as Foo::set_bar, is no longer okay:

    impl<T> Vec<T> {
        fn set_len(&mut self, len: usize) {
            self.len = len;
        }
    }
This is because the unsafe code inside of other methods of Vec<T> need to be able to rely on the fact that len <= capacity. And so you'll find that Vec<T>::set_len in Rust is marked as unsafe, even though it doesn't contain unsafe code. It still requires judicious use of to not introduce memory unsafety.

And this is why the module being the privacy boundary matters: the only way to set len directly in safe Rust code is code within the same privacy boundary as the Vec<T> itself. And so, that's the same module, or its children.