Zlib-rs is faster than C

(trifectatech.org)

341 points dochtman | 5 comments | 16 Mar 25 19:35 UTC | HN request time: 0.66s | source

Show context

YZF ◴[16 Mar 25 20:12 UTC] No.43381858[source]▶

I found out I already know Rust:

        unsafe {
            let x_tmp0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x10);
            xmm_crc0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x01);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, x_tmp0);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, xmm_crc0);

Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?

Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?

replies(30): >>43381895 #>>43381907 #>>43381922 #>>43381925 #>>43381928 #>>43381931 #>>43381934 #>>43381952 #>>43381971 #>>43381985 #>>43382004 #>>43382028 #>>43382110 #>>43382166 #>>43382503 #>>43382805 #>>43382836 #>>43383033 #>>43383096 #>>43383480 #>>43384867 #>>43385039 #>>43385521 #>>43385577 #>>43386151 #>>43386256 #>>43386389 #>>43387043 #>>43388529 #>>43392530 #

akx ◴[16 Mar 25 20:20 UTC] No.43381928[source]▶

>>43381858 #

To quote the Rust book (https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html):

  In addition, unsafe does not mean the code inside the
  block is necessarily dangerous or that it will definitely
  have memory safety problems: the intent is that as the
  programmer, you’ll ensure the code inside an unsafe block
  will access memory in a valid way.

Since you say you already know that much Rust, you can be that programmer!

replies(1): >>43382103 #

silisili ◴[16 Mar 25 20:42 UTC] No.43382103[source]▶

>>43381928 #

I feel like C programmers had the same idea, and well, we see how that works out in practice.

replies(3): >>43382249 #>>43382631 #>>43386771 #

sunshowers ◴[16 Mar 25 21:40 UTC] No.43382631[source]▶

>>43382103 #

No, C lacks encapsulation of unsafe code. This is very important. Encapsulation is the only way to scale local reasoning into global correctness.

replies(2): >>43385092 #>>43387548 #

DannyBee ◴[17 Mar 25 11:57 UTC] No.43387548[source]▶

>>43382631 #

Hard disagree - if you violate the invariants in Rust unsafe code, you can cause global problems with local code. You can cause use-after-free, and other borrow checker violations, with incorrect unsafe code. Nothing will flag it, you will have no idea which unsafe code block is causing the isue, debugging will be hard.

I have no idea what your definition of encapsulation is, but mine is not this.

It's really only encapsulated in the sense that if you have a finite and small set of unsafe blocks, you can audit them easier and be pretty sure that your memory safety bugs are in there. This reality really doesn't exist much anymore because of how much unsafe is often ued, and since you you have to audit all of them, whether they come from a library or not, it's not as useful to claim encapsulation as one thinks.

I do agree in theory that unsafe encapsulation was supposed to be a thing, but i think it's crazy at this point to not admit that unsafe blocks turned out to easily have much more global effects than people expected, in many more cases, and are used more readily than expected.

Saying "scaling reasoning" also implies someone reasoned about it, or can reason about it.

But the practical problem is the same in both cases - someone got the reasoning wrong and nothing flagged it.

Wanna go search github for how many super popular libraries using unsafe had global correctness issues due to local unsafe blocks that a human reasoned incorrectly about, but something like miri found? Most of that unsafety that turned out to be buggy also was done for (unnecessary) performance reasons.

What you are saying is just something people tell themselves to make them feel okay about using unsafe all over the place.

If you want global correctness, something has to verify it, ideally not-human.

In the end, the thing C lacks is tools like miri that can be used practically with low false-positives, not "encapsulation" of unsafe code, which is trivially easy to perform in C.

Let's not kid ourselves here and end up building an ecosystem that is just as bad as the C one, but our egos refuse to allow us to admit it. We should instead admit our problems and try to improve.

Unsafe also has legitimate use cases in rust, for sure - but most unsafe code i look at does not need to exist, and is not better than unsafe C.

I'll give you an example: There are entire popular embedded bluetooth stacks in rust using unsafe global mutable variables and raw pointers and ..., across threads, for everything.

This is not better than the C equivalent - in fact it's worse, because users think it is safe and it's very not.

At least nobody thinks the C version is safe. It will often therefore be shoved in a binary that is highly sandboxed/restricted/etc.

It would be one thing if this was in the process of being ported/translated from C. But it's not.

Using intrinsics that require alignment and the API was still being worked on - probably a reasonable use of unsafe (though still easy to cause global problems like buffer overflows if you screwed up the alignment)

The bluetooth example - unreasonable.

replies(2): >>43389237 #>>43391195 #

burntsushi ◴[17 Mar 25 14:56 UTC] No.43389237[source]▶

>>43387548 #

The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway.

The `memchr` crate, for example, has an entirely safe API. Nobody needs to use `unsafe` to use any part of it. But its internals have `unsafe` littered everywhere. Could the crate have bugs that result in UB due to a particular use of the `memchr` API? Yes! Doesn't that violate encapsulation? No! A bug inside an encapsulated boundary does not violate the very idea of encapsulation itself.

Encapsulation is about blame. It means that if `memchr` exposes a safe API, and if you use `memchr` and you get UB as a result of some `unsafe` code inside of `memchr`, then that means the problem is inside of `memchr`. The problem is definitively not with the caller using the library. That is, they aren't "holding it wrong."

I'm surprised that someone with as much experience as you is missing this nuance. How many times have you run into a C library API that has UB, you report the bug and the maintainer says, "sorry bro, but you're holding that shit wrong, your fault." In Rust, the only way that ought (very specifically using ought and not is) to be true is if the API is tagged with `unsafe`.

Now, there are all sorts of caveats that don't change the overall point. "totally safe transmute" being an obvious demonstration of one of them[1] by fiddling with `/proc/self/mem`. And of course, Rust does have soundness bugs. But neither of these things change the fundamental idea of encapsulation.

And yes, one obvious shortcoming of this approach is that... well... people don't have to follow it! People can lie! I can expose a safe API, you can get UB and I can reject blame and say, "well you're holding it wrong." And thus, we're mostly back into how languages like C deal with these sorts of things. And that is indeed a bummer. And there are for sure examples of that in the ecosystem. But the glaring thing you've left out of your analysis is all of the crates that don't lie and specifically set out to provide a sound API.

The great thing about progress is that we don't have to perfect. I'm really disappointed that you seem to be missing the forest for the trees here.

[1]: https://github.com/ben0x539/totally-safe-transmute/blob/main...

replies(1): >>43389748 #

DannyBee ◴[17 Mar 25 15:45 UTC] No.43389748[source]▶

>>43389237 #

"The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway."

Well, no, actually. At least, not in an (IMHO) useful way.

I can break your safe API by getting the constraints wrong on unsafe code inside that API.

Also, unsafe usage elsewhere is not local. I can break your impossible to misuse API through an unsafe API that someone else used elsewhere, completely outside my control, and then wrapped in a safe API. Some of these are of course, bugs in rust/compiler, etc. I'm just offering i've yet to hear the view taken that the ability to do this is always a bug in the language/compiler, and will be destroyed on sight.

Beyond that:

To the degree this is useful encapsulation for tracking things down, it is only useful when the amount is small and you can reason about it.

This is simply no longer true in any reasonably sized rust app.

As a result, as you say, it is then only useful for saying who is at fault in the sense of whether i'm holding it wrong. To me, that is basically worthless at scale.

"I'm surprised that someone with as much experience as you is missing this nuance."

I don't miss it - I just don't think it's as useful as claimed.

This level of "encapsulation", which provides no real guarantee except "the set of bugs is caused somewhere by the set of unsafe blocks" is fairly unhelpful at large scale.

I have audited hundreds of thousands of lines of rust code to find bugs caused by unsafe usage. The thing that made it at all tractable was not this form of encapsulation - it was in fact, 100% worthless in doing that at scale because it was till tons and tons and tons of code to try to reason about, across lots of libraries and dependencies. As you say, it only helps provide blame once found, and blame is not that useful at scale. It does not make the code safer. It does not make it easier to track down. It only declares, that after i've spent all the time, that it is not my fault. But also nobody has to do anything anyway.

For small programs, this buys you something, as i said, as long as the set of unsafe blocks is small enough to be tractable to audit, cool. You can find bugs easier. In that sense, the tons of hobby programs, small libraries, etc, are a lot less likely to have bugs when written in rust (modulo their dependencies on unsafe code).

But like, your position seems to be that it is fairly useful that i can go to a library and tell them "your crap is broken", and be right about it. To me, this does not buy a lot in the kinds of large complex systems rust hopes to replace in C/C++. (it also might be false)

In actually tracking down the bug, which is what i care about, the thing that was useful is that i could run miri and lots of other things on it and get useful results that pointed me towards the most likely causes of issues..

So don't get me wrong - this is overall better than C, but writing lots of rust (i haven't written C/C++ at all in a while, actually) I still tire of the constant claims of the amount of rust safety. You are the rare rust person who understand the nuance and is willing to admit there is any flaw or non-perfection whatsoever.

A you say, there are lots of things that ought to be true in rust that are not. You have a good understanding of this nuance, and where it fails.

But it is you, i believe, who is missing the forest for the trees, because most do not have this.

I'll be concrete and i guess controversial in a way you are 100% free to disagree with, but might as well throw a stake in the ground - it's hacker news, might as well have fun making a comment someone can beat me over the head with later: If nothing changes, and the rust ecosystem grows by a factor of 100x while changing nothing about how it behaves WRT to unsafe usage, and no tooling gets significantly better, Rust will not end up better than C in practice. I don't mean - it will not have less bugs/vulnerabilities - i think it would by far!

But whether you have 100 billion of them, or 1 billion of them, and thus made a 100x improvement, i don't think matters too much when it's still a billion :)

Meanwhile, if the rust ecosystem got worse about unsafe, but made tools like Miri 50x faster (and made more tools like it that help verification in practice), it will not end up better than C.

To me - it is the tooling, and not this sort of encapsulation, that will make a practical difference or not at scale.

The idea that you will convince people not to write broken unsafe code, in ways that breaks safe APIs, or that the ability to assign blame matters, is very strange to me, and is no better than C. As systems grow, the likelihood of totally safe transmutes growing in them is basically 100% :)

FWIW - I also agree you don't have to be perfect, nor do I fault rust for not being perfect. Instead, i simply disagree that at scale, this sort of ability to place blame is useful. To me, it's the ability to find the bugs quickly and as automated as possible that is useful.

I need to find the totally safe transmutes causing issues in my system, not hand it to someone else after determining it couldn't be my fault.

replies(2): >>43390293 #>>43391330 #

1. burntsushi ◴[17 Mar 25 16:35 UTC] No.43390293[source]▶

>>43389748 #

> I can break your safe API by getting the constraints wrong on unsafe code inside that API.

This doesn't make any sense at all as a broader point. Of course you can break the safe API by introducing a bug inside the implementation! I honestly just cannot figure out how you have a misunderstanding of this magnitude, and I'm forced to conclude that we are mis-communicating at some level.

I did read the rest of your comment, and the most significant point I can take away from it is that you're making a claim about scale. I think the dissonance introduced with comments like the one above makes it very hard for me to trust your experience here and the conclusions you've drawn from it. But I will note that whether Rust's safety story scales is from my perspective a different thing entirely from the factual claim that Rust enables safe encapsulation of `unsafe` usage.

You may say that just because Rust enables safe encapsulation doesn't mean programmers using Rust actually follow through with that in practice. And yes, absolutely, it doesn't. You can't derive an is from an ought. But in my experience, it totally does. I do work on lots of "hobby" stuff in Rust (although I try to treat it professionally, I just mean that I am not directly paid for it beyond donations), but I am also paid to write Rust too. I do not have your experience with Rust at scale, so I cannot refute it. But you've said enough questionable things here that I can't trust it either.

replies(1): >>43394829 #

2. DannyBee ◴[18 Mar 25 01:30 UTC] No.43394829[source]▶

>>43390293 (TP) #

This doesn't seem like we are getting anywhere on this part of the thread, unfortunately.

My suggestion would be - if we are ever in the same place, let's just grab coffee or something.

In the end - i suspect we are just going to find we have different enough experiences that our views of safe encapsulation and its usefulness are very different.

Let's put that aside for a second - I'll also take one more pass at the original place we started, and then give up:

To go back all the way to where we started, the comment i was originally replying to said "No, C lacks encapsulation of unsafe code. This is very important. Encapsulation is the only way to scale local reasoning into global correctness."

So we were in fact talking about scale and more particularly how to scale to global correctness, not really whether rust enables safe encapsulation, but whether encapsulation istelf enables local reasoning to scale to global correctness (In theory or in practice)

My view here, restated more succinctly, is "their claim that encapsulation is the only way to scale local reasoning to global correctness is emphatically wrong" (both in theory and practice).

My argument there remains simple: Tooling is what enables you to scale local reasoning to global correctness, not encapsulation.

Putting aside how useful or not it is otherwise for a second, encapsulation, by itself, does not enable you to reason your way from local results to global results soundly at all - for exactly the reason you mention in the first sentence here - bugs in local correctness reasoning can have global correctness effect. Garbage in, garbage out. Encapsulation does not wave a wand at this and make it go away[1]. There are lot of other reasons, this is just the one we went down a bit of a rabbit hole on :)

Instead, it is tooling that lets you scale. If you can have "catches 95+%" of local reasoning error (feel free to choose your own bar), you can almost certainly parlay that into high-percent global correctness, regardless of whether anything is encapsulated at all or not.

Now: If encapsulation enables an easier job of that tooling, and i believe it helps a lot, fwiw, then that's useful. But it's the tooling you want, not the encapsulation. Again, concretely: If I could not safely encapsulate anything, but had tooling that caught 100% of local reasoning issues, i would be much better off than having 100% safely encapsulated code, but no tooling to verify local or global reasoning. This is true (to me) even if you lower the "catches 100% of local reasoning issues" down significantly.

[1] FWIW, i also don't argue that this problem is particular to rust. It's not, of course. It exists everywhere. But i'm not the one claiming that rust will enable you to scale local reasoning to global correctness through encapsulation :P

replies(1): >>43399202 #

3. burntsushi ◴[18 Mar 25 13:26 UTC] No.43399202[source]▶

>>43394829 #

> To go back all the way to where we started, the comment i was originally replying to said "No, C lacks encapsulation of unsafe code. This is very important. Encapsulation is the only way to scale local reasoning into global correctness."

That's fair. I was focusing more on the factual aspect of "Rust enables encapsulating `unsafe`." But you're right, this statement is making a bigger claim than that, and it crosses over into something that is a (in theory) testable opinion.

I do agree with it though. But I recognize that it is a different claim than the one I was putting forward as factual.

I think for this, I would say that my experience with Rust has demonstrated that encapsulation is working at some non-trivial scale. The extent to which it will continue to scale depends, in part, on whether people writing Rust prioritize soundness. In my bubble, this prioritization is extremely high. But will what is arguably a cultural norm extend out to all Rust programmers everywhere?

I legitimately don't know. This is why I was one of the first (but not the first) people to make a stink about improper `unsafe` usage inside the Actix project some years ago. It was because I perceived the project as specifically flouting the cultural norm and rejecting soundness as a goal to strive for. I do indeed see this as an essential piece of what Rust brings to the table, and for it to succeed in its goals, we have to somehow figure out how to maintain the cultural norm that safe APIs cannot be used in a way that leads to UB.

I think where you and I differ is both in what we've seen (it sounds like you've seen evidence of this cultural norm eroding) and what we consider encapsulation busting. I'm not at all worried about bugs in `unsafe` code. Those are going to happen, and yes, they will lead to safe Rust having UB. But those are "just" bugs. The vastly more important thing to me is intent and where blame is assigned when UB happens. If blame starts shifting to the safe code, then that will indicate the erosion of that cultural norm.

As for tooling, I think it's vital to making sure safe encapsulations are correct, but I don't see it as having a significant impact on the norm.

Then again, these are the days in which even some of the strongest cultural norms we've had (in the United States anyway) have been eroding. So maybe building a system on top of one is folly.

replies(1): >>43414836 #

4. DannyBee ◴[19 Mar 25 17:24 UTC] No.43414836{3}[source]▶

>>43399202 #

"I do agree with it though. But I recognize that it is a different claim than the one I was putting forward as factual."

Maybe the core is that i don't understand why you agree with it :)

Maybe your definition of global correctness is different?

Maybe you are thinking of properties that are different than i am thinking of?

To me, for most (IMHO useful) definitions of global correctness, for most properties, the claim is provably false.

For me, local and global correctness that is useful at scale is not really "user-asserted correctness modulo implementation bugs".

Let's take a property like memory safety and talk about it locally and globally.

Let's just remove some nuance and say lots of these forms of encapsulation can be thought of as assertions of correctness wrt to memory safety (for this example, obviously, there are more things it asserts, and it's not always memory safe in various types of encapsulation) - i assert that you don't have to worry about this - i did, and i'm sure it's right :)

This assertion, once wrong in a local routine, makes a global claim that "this program is memory safe" now incorrect. Your local correctness did not scale to global correctness here, because your wrong local assertion led to a wrong global answer.

Tooling would not have done this.

Does it matter? maybe, maybe not! That's the province of creative security researchers and other folks.

My office mate at IBM was once tasked (eons ago) with computing the probability that a random memory bit flip would actually cause a program to misbehave.

Obviously, you can go too far, and end arguing about whether the cosmic rays affecting your program really violate your proof of correctness :)

But for a property like this, i don't want to rely on norms at scale. Because those norms generate mostly assertions of correctness. Once i've got tons and tons of assertions, and nobody has actually proved anything about them, that's a house of cards. Even if they are diligent and right 99% of the time, if you have 100000 of them, that's uh, 1000 of them that are wrong. and as discussed, it only takes one to break global correctness.

If you want all 100k to be correct with 90% probablity, you'd need people to be 99.9999% correct. That seems unlikely :)

I don't mean that i'm not willing to accept the norm is better - i am. I certainly would agree the average rust program is more bug free and more safe than C ones. But i've seen too much at scale to not want some mechanism of verifying that norm, or at least a large part of it.

As an aside, there are also, to me, properties that are useful modulo implementation bugs. But for me, these mostly fall into proving algorithmic correctness.

IE it's useful to prove that a lock-free algorithm always makes progress, assuming someone did not screw up the implementation. It's separately useful to be able to prove a given implementation is not screwed up, but often much harder.

As for norms - I have zero disagreement that rust has better norms overall, but yes, i've seen erosion. I would recommend, for example, trying to do some embedded rust programming if you want to see an area where no rust norms seem to exist under the covers.

Almost all libraries are littered with safe encapsulation that is utterly broken in many ways. Not like "oh if you think about this very hard it's broken".

It often feels like they just wanted to make the errors go away, so they put it in an unsafe block, and then didn't want to have to mark everything as unsafe to encapsulated it. I wish I was joking.

These libraries are often the de-facto way to achieve something (like bluetooth support). They are not getting better, they are getting copied and these pieces reused in chunks, causing the same elsewhere. and FWIW, none of these needed much if any unsafe at all (interacting with a bluetooth controller is not as unsafe as it seems. It is mostly just speaking to an embedded uart and issuing it some well-specified commands. So you probably need unsafe to deal with the send/receive, but not much else).

I can give you links and details privately, i don't really want to sort of publicly shame things for the sake of this argument :)

There are very well thought out and done embedded libraries mind you, but uh, they are the minority.

THis is not the only area, mind you, but it's an easy one to poke.

All norms fail over time, and you have to plan for it. You don't want to rely on them for things like "memory safety" :)

Good leadership, mentoring, etc makes them fail slower, but the thing that always causes failure is growth. Fast grow is even worse, but there are very few norms that scale and survive factors of 100x. THis is especially true when they are cultural norms.

I don't believe Rust will be the first to succeed at maintaining the level of norm it had 5-10 years ago, around this sort of thing, in the face of massive growth and scale.

(Though i have no doubt it can if it neither grows nor scales).

[1] How much global correctness is affected by local correctness depends on the property - there are some where some wrong local answers often change nothing because they are basically minimum(all local answers). There are some where a single wrong local answer makes it totally wrong because they are basically maximum(all local answers). The closer they are to simple union/intersection or min/max of local answers, the easier it is to compute global correctness, but the righter your local answers have to be :)

replies(1): >>43419475 #

5. burntsushi ◴[20 Mar 25 02:44 UTC] No.43419475{4}[source]▶

>>43414836 #

> Maybe the core is that i don't understand why you agree with it :)

Because of encapsulation. I don't need to look far to see the effects of encapsulation (and abstraction) on computing.

I read your whole comment, but I really want to tighten this discussion up. I think the biggest thing I'm personally missing from coming over to your view of things is examples. In particular:

> Almost all libraries are littered with safe encapsulation that is utterly broken in many ways. Not like "oh if you think about this very hard it's broken".

Can you show me? If it's really "almost all," then you should even be able to point to a crate I've authored with a broken safe encapsulation. `regex-automata`, `jiff`, `bstr`, `byteorder`, `memchr` and `aho-corasick` all use `unsafe`. Can you find a point of unsoundness?

I don't want a library here or there. I am certain there are some libraries that are intentionally flouting Rust's norms here. So a couple of examples wouldn't be enough to convince me because I don't think a minority of people flouting Rust's norms is a big problem unless it can be shown that this minority is growing in size. What I want to see is evidence that this is both widespread and intentional. It's hard for me to believe that it is without me noticing.

If you want to do this privately, you can email: jamslam@gmail.com

↑