Zlib-rs is faster than C

(trifectatech.org)

341 points dochtman | 1 comments | 16 Mar 25 19:35 UTC | HN request time: 0.202s | source

Show context

YZF ◴[16 Mar 25 20:12 UTC] No.43381858[source]▶

I found out I already know Rust:

        unsafe {
            let x_tmp0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x10);
            xmm_crc0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x01);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, x_tmp0);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, xmm_crc0);

Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?

Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?

replies(30): >>43381895 #>>43381907 #>>43381922 #>>43381925 #>>43381928 #>>43381931 #>>43381934 #>>43381952 #>>43381971 #>>43381985 #>>43382004 #>>43382028 #>>43382110 #>>43382166 #>>43382503 #>>43382805 #>>43382836 #>>43383033 #>>43383096 #>>43383480 #>>43384867 #>>43385039 #>>43385521 #>>43385577 #>>43386151 #>>43386256 #>>43386389 #>>43387043 #>>43388529 #>>43392530 #

akx ◴[16 Mar 25 20:20 UTC] No.43381928[source]▶

>>43381858 #

To quote the Rust book (https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html):

  In addition, unsafe does not mean the code inside the
  block is necessarily dangerous or that it will definitely
  have memory safety problems: the intent is that as the
  programmer, you’ll ensure the code inside an unsafe block
  will access memory in a valid way.

Since you say you already know that much Rust, you can be that programmer!

replies(1): >>43382103 #

silisili ◴[16 Mar 25 20:42 UTC] No.43382103[source]▶

>>43381928 #

I feel like C programmers had the same idea, and well, we see how that works out in practice.

replies(3): >>43382249 #>>43382631 #>>43386771 #

sunshowers ◴[16 Mar 25 21:40 UTC] No.43382631[source]▶

>>43382103 #

No, C lacks encapsulation of unsafe code. This is very important. Encapsulation is the only way to scale local reasoning into global correctness.

replies(2): >>43385092 #>>43387548 #

DannyBee ◴[17 Mar 25 11:57 UTC] No.43387548[source]▶

>>43382631 #

Hard disagree - if you violate the invariants in Rust unsafe code, you can cause global problems with local code. You can cause use-after-free, and other borrow checker violations, with incorrect unsafe code. Nothing will flag it, you will have no idea which unsafe code block is causing the isue, debugging will be hard.

I have no idea what your definition of encapsulation is, but mine is not this.

It's really only encapsulated in the sense that if you have a finite and small set of unsafe blocks, you can audit them easier and be pretty sure that your memory safety bugs are in there. This reality really doesn't exist much anymore because of how much unsafe is often ued, and since you you have to audit all of them, whether they come from a library or not, it's not as useful to claim encapsulation as one thinks.

I do agree in theory that unsafe encapsulation was supposed to be a thing, but i think it's crazy at this point to not admit that unsafe blocks turned out to easily have much more global effects than people expected, in many more cases, and are used more readily than expected.

Saying "scaling reasoning" also implies someone reasoned about it, or can reason about it.

But the practical problem is the same in both cases - someone got the reasoning wrong and nothing flagged it.

Wanna go search github for how many super popular libraries using unsafe had global correctness issues due to local unsafe blocks that a human reasoned incorrectly about, but something like miri found? Most of that unsafety that turned out to be buggy also was done for (unnecessary) performance reasons.

What you are saying is just something people tell themselves to make them feel okay about using unsafe all over the place.

If you want global correctness, something has to verify it, ideally not-human.

In the end, the thing C lacks is tools like miri that can be used practically with low false-positives, not "encapsulation" of unsafe code, which is trivially easy to perform in C.

Let's not kid ourselves here and end up building an ecosystem that is just as bad as the C one, but our egos refuse to allow us to admit it. We should instead admit our problems and try to improve.

Unsafe also has legitimate use cases in rust, for sure - but most unsafe code i look at does not need to exist, and is not better than unsafe C.

I'll give you an example: There are entire popular embedded bluetooth stacks in rust using unsafe global mutable variables and raw pointers and ..., across threads, for everything.

This is not better than the C equivalent - in fact it's worse, because users think it is safe and it's very not.

At least nobody thinks the C version is safe. It will often therefore be shoved in a binary that is highly sandboxed/restricted/etc.

It would be one thing if this was in the process of being ported/translated from C. But it's not.

Using intrinsics that require alignment and the API was still being worked on - probably a reasonable use of unsafe (though still easy to cause global problems like buffer overflows if you screwed up the alignment)

The bluetooth example - unreasonable.

replies(2): >>43389237 #>>43391195 #

burntsushi ◴[17 Mar 25 14:56 UTC] No.43389237[source]▶

>>43387548 #

The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway.

The `memchr` crate, for example, has an entirely safe API. Nobody needs to use `unsafe` to use any part of it. But its internals have `unsafe` littered everywhere. Could the crate have bugs that result in UB due to a particular use of the `memchr` API? Yes! Doesn't that violate encapsulation? No! A bug inside an encapsulated boundary does not violate the very idea of encapsulation itself.

Encapsulation is about blame. It means that if `memchr` exposes a safe API, and if you use `memchr` and you get UB as a result of some `unsafe` code inside of `memchr`, then that means the problem is inside of `memchr`. The problem is definitively not with the caller using the library. That is, they aren't "holding it wrong."

I'm surprised that someone with as much experience as you is missing this nuance. How many times have you run into a C library API that has UB, you report the bug and the maintainer says, "sorry bro, but you're holding that shit wrong, your fault." In Rust, the only way that ought (very specifically using ought and not is) to be true is if the API is tagged with `unsafe`.

Now, there are all sorts of caveats that don't change the overall point. "totally safe transmute" being an obvious demonstration of one of them[1] by fiddling with `/proc/self/mem`. And of course, Rust does have soundness bugs. But neither of these things change the fundamental idea of encapsulation.

And yes, one obvious shortcoming of this approach is that... well... people don't have to follow it! People can lie! I can expose a safe API, you can get UB and I can reject blame and say, "well you're holding it wrong." And thus, we're mostly back into how languages like C deal with these sorts of things. And that is indeed a bummer. And there are for sure examples of that in the ecosystem. But the glaring thing you've left out of your analysis is all of the crates that don't lie and specifically set out to provide a sound API.

The great thing about progress is that we don't have to perfect. I'm really disappointed that you seem to be missing the forest for the trees here.

[1]: https://github.com/ben0x539/totally-safe-transmute/blob/main...

replies(1): >>43389748 #

DannyBee ◴[17 Mar 25 15:45 UTC] No.43389748[source]▶

>>43389237 #

"The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway."

Well, no, actually. At least, not in an (IMHO) useful way.

I can break your safe API by getting the constraints wrong on unsafe code inside that API.

Also, unsafe usage elsewhere is not local. I can break your impossible to misuse API through an unsafe API that someone else used elsewhere, completely outside my control, and then wrapped in a safe API. Some of these are of course, bugs in rust/compiler, etc. I'm just offering i've yet to hear the view taken that the ability to do this is always a bug in the language/compiler, and will be destroyed on sight.

Beyond that:

To the degree this is useful encapsulation for tracking things down, it is only useful when the amount is small and you can reason about it.

This is simply no longer true in any reasonably sized rust app.

As a result, as you say, it is then only useful for saying who is at fault in the sense of whether i'm holding it wrong. To me, that is basically worthless at scale.

"I'm surprised that someone with as much experience as you is missing this nuance."

I don't miss it - I just don't think it's as useful as claimed.

This level of "encapsulation", which provides no real guarantee except "the set of bugs is caused somewhere by the set of unsafe blocks" is fairly unhelpful at large scale.

I have audited hundreds of thousands of lines of rust code to find bugs caused by unsafe usage. The thing that made it at all tractable was not this form of encapsulation - it was in fact, 100% worthless in doing that at scale because it was till tons and tons and tons of code to try to reason about, across lots of libraries and dependencies. As you say, it only helps provide blame once found, and blame is not that useful at scale. It does not make the code safer. It does not make it easier to track down. It only declares, that after i've spent all the time, that it is not my fault. But also nobody has to do anything anyway.

For small programs, this buys you something, as i said, as long as the set of unsafe blocks is small enough to be tractable to audit, cool. You can find bugs easier. In that sense, the tons of hobby programs, small libraries, etc, are a lot less likely to have bugs when written in rust (modulo their dependencies on unsafe code).

But like, your position seems to be that it is fairly useful that i can go to a library and tell them "your crap is broken", and be right about it. To me, this does not buy a lot in the kinds of large complex systems rust hopes to replace in C/C++. (it also might be false)

In actually tracking down the bug, which is what i care about, the thing that was useful is that i could run miri and lots of other things on it and get useful results that pointed me towards the most likely causes of issues..

So don't get me wrong - this is overall better than C, but writing lots of rust (i haven't written C/C++ at all in a while, actually) I still tire of the constant claims of the amount of rust safety. You are the rare rust person who understand the nuance and is willing to admit there is any flaw or non-perfection whatsoever.

A you say, there are lots of things that ought to be true in rust that are not. You have a good understanding of this nuance, and where it fails.

But it is you, i believe, who is missing the forest for the trees, because most do not have this.

I'll be concrete and i guess controversial in a way you are 100% free to disagree with, but might as well throw a stake in the ground - it's hacker news, might as well have fun making a comment someone can beat me over the head with later: If nothing changes, and the rust ecosystem grows by a factor of 100x while changing nothing about how it behaves WRT to unsafe usage, and no tooling gets significantly better, Rust will not end up better than C in practice. I don't mean - it will not have less bugs/vulnerabilities - i think it would by far!

But whether you have 100 billion of them, or 1 billion of them, and thus made a 100x improvement, i don't think matters too much when it's still a billion :)

Meanwhile, if the rust ecosystem got worse about unsafe, but made tools like Miri 50x faster (and made more tools like it that help verification in practice), it will not end up better than C.

To me - it is the tooling, and not this sort of encapsulation, that will make a practical difference or not at scale.

The idea that you will convince people not to write broken unsafe code, in ways that breaks safe APIs, or that the ability to assign blame matters, is very strange to me, and is no better than C. As systems grow, the likelihood of totally safe transmutes growing in them is basically 100% :)

FWIW - I also agree you don't have to be perfect, nor do I fault rust for not being perfect. Instead, i simply disagree that at scale, this sort of ability to place blame is useful. To me, it's the ability to find the bugs quickly and as automated as possible that is useful.

I need to find the totally safe transmutes causing issues in my system, not hand it to someone else after determining it couldn't be my fault.

replies(2): >>43390293 #>>43391330 #

sunshowers ◴[17 Mar 25 18:24 UTC] No.43391330[source]▶

>>43389748 #

Are you writing lots of FFI and/or embedded code? Those are the main places I see unsafe being used a lot.

The tooling and the encapsulation go hand in hand.

> The idea that you will convince people not to write broken unsafe code, in ways that breaks safe APIs, or that the ability to assign blame matters, is very strange to me, and is no better than C. As systems grow, the likelihood of totally safe transmutes growing in them is basically 100% :)

To be honest this doesn't track with my experience at all. Unsafe just isn't that commonly used in projects I contribute to. When it is, it is aggressively encapsulated.

replies(1): >>43394474 #

1. DannyBee ◴[18 Mar 25 00:40 UTC] No.43394474[source]▶

>>43391330 #

Yes - I spend about half my time with rust embedded, where unsafe code is just everywhere, whether needed or not.

There is still plenty in my non-embedded stuff, but a fair amount hardware-adjacent (IE i have to drive things like relay cards, just from a desktop machine). to be fair.

But i've found plenty of broken unafe in things like, uh, constraint solvers.

I would agree that useful and successful rust projects aggressively encapsulate (and attempt to avoid) unsafe usage.

I will still maintain my belief that this will not be enough over time and scale.

↑