Zlib-rs is faster than C

(trifectatech.org)

Show context

YZF ◴[16 Mar 25 20:12 UTC] No.43381858[source]▶

I found out I already know Rust:

        unsafe {
            let x_tmp0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x10);
            xmm_crc0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x01);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, x_tmp0);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, xmm_crc0);

Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?

Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?

replies(30): >>43381895 #>>43381907 #>>43381922 #>>43381925 #>>43381928 #>>43381931 #>>43381934 #>>43381952 #>>43381971 #>>43381985 #>>43382004 #>>43382028 #>>43382110 #>>43382166 #>>43382503 #>>43382805 #>>43382836 #>>43383033 #>>43383096 #>>43383480 #>>43384867 #>>43385039 #>>43385521 #>>43385577 #>>43386151 #>>43386256 #>>43386389 #>>43387043 #>>43388529 #>>43392530 #

Aurornis ◴[16 Mar 25 20:21 UTC] No.43381931[source]▶

>>43381858 #

Using unsafe blocks in Rust is confusing when you first see it. The idea is that you have to opt-out of compiler safety guarantees for specific sections of code, but they’re clearly marked by the unsafe block.

In good practice it’s used judiciously in a codebase where it makes sense. Those sections receive extra attention and analysis by the developers.

Of course you can find sloppy codebases where people reach for unsafe as a way to get around Rust instead of writing code the Rust way, but that’s not the intent.

You can also find die-hard Rust users who think unsafe should never be used and make a point to avoid libraries that use it, but that’s excessive.

replies(10): >>43381986 #>>43382095 #>>43382102 #>>43382323 #>>43385098 #>>43385651 #>>43386071 #>>43386189 #>>43386569 #>>43392018 #

chongli ◴[16 Mar 25 20:41 UTC] No.43382102[source]▶

>>43381931 #

Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees? As far as I'm aware, inside the unsafe block you can do whatever you want which means all of the nice memory-safety properties of the language go away.

It's like letting a wet dog (who'd just been swimming in a nearby swamp) run loose inside your hermetically sealed cleanroom.

replies(16): >>43382176 #>>43382305 #>>43382448 #>>43382481 #>>43382485 #>>43382606 #>>43382685 #>>43382739 #>>43383207 #>>43383637 #>>43383811 #>>43384238 #>>43384281 #>>43385190 #>>43385656 #>>43387402 #

timschmidt ◴[16 Mar 25 20:49 UTC] No.43382176[source]▶

>>43382102 #

It seems like you've got it backwards. Even unsafe rust is still more strict than C. Here's what the book has to say (https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html)

"You can take five actions in unsafe Rust that you can’t in safe Rust, which we call unsafe superpowers. Those superpowers include the ability to:

    Dereference a raw pointer
    Call an unsafe function or method
    Access or modify a mutable static variable
    Implement an unsafe trait
    Access fields of a union

It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks: if you use a reference in unsafe code, it will still be checked. The unsafe keyword only gives you access to these five features that are then not checked by the compiler for memory safety. You’ll still get some degree of safety inside of an unsafe block.

In addition, unsafe does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems: the intent is that as the programmer, you’ll ensure the code inside an unsafe block will access memory in a valid way.

People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs."

replies(6): >>43382290 #>>43382353 #>>43382376 #>>43383159 #>>43383265 #>>43386165 #

uecker ◴[16 Mar 25 21:14 UTC] No.43382376[source]▶

>>43382176 #

This description is still misleading. The preconditions for the correctness of an unsafe block can very much depend on the correctness of the code outside and it is easy to find Rust bugs where exactly this was the cause. This is very similar where often C out of bounds accesses are caused by some logic error elsewhere. Also an unsafe block has to maintain all the invariants the safe Rust part needs to maintain correctness.

replies(4): >>43382514 #>>43382566 #>>43382585 #>>43383088 #

lambda ◴[16 Mar 25 21:36 UTC] No.43382585[source]▶

>>43382376 #

So, it's true that unsafe code can depend on preconditions that need to be upheld by safe code.

But using ordinary module encapsulation and private fields, you can scope the code that needs to uphold those preconditions to a particular module.

So the "trusted computing base" for the unsafe code can still be scoped and limited, allowing you to reduce the amount of code you need to audit and be particularly careful about for upholding safety guarantees.

Basically, when writing unsafe code, the actual unsafe operations are scoped to only the unsafe blocks, and they have preconditions that you need to scope to a particular module boundary to ensure that there's a limited amount of code that needs to be audited to ensure it upholds all of the safety invariants.

Ralf Jung has written a number of good papers and blog posts on this topic.

replies(1): >>43382721 #

uecker ◴[16 Mar 25 21:47 UTC] No.43382721[source]▶

>>43382585 #

And you think one can not modularize C code and encapsulate critical buffer operations in much safer APIs? One can, the problem is that a lot of legacy C code was not written this way. Also lot of newly written C code is not written this way, but the reason is often that people cut corners when they need to get things done with limited time and resources. The same you will see with Rust.

replies(4): >>43383131 #>>43383951 #>>43384869 #>>43386840 #

gf000 ◴[16 Mar 25 22:29 UTC] No.43383131[source]▶

>>43382721 #

Even innocent looking C code can be chock-full of UBs that can invalidate your "local reasoning" capabilities. So, not even close.

replies(1): >>43383379 #

wavemode ◴[16 Mar 25 22:57 UTC] No.43383379[source]▶

>>43383131 #

Care to share an example?

replies(3): >>43383437 #>>43383963 #>>43385097 #

masfuerte ◴[17 Mar 25 00:28 UTC] No.43383963[source]▶

>>43383379 #

   int average(int x, int y) {
       return (x+y)/2;
   }

replies(3): >>43385221 #>>43392246 #>>43445900 #

throwaway2037 ◴[17 Mar 25 04:29 UTC] No.43385221[source]▶

>>43383963 #

I assume you are hinting at 'int' is signed here? And, that signed overflow is UB in C? Real question: Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers? I don't know any. As I understand, two's complement correctly supports overflow for signed arithmetic.

I might be old, but more than 10 years ago, hardly anyone talked about UB in C and C++ programming. In the last 10 years, it is all the rage, but seems to add very little to the conversation. For example, if you program C or C++ with the Win32 API, there are loads of weird UB-ish things that seem to work fine.

replies(3): >>43385280 #>>43385345 #>>43385566 #

oneshtein ◴[17 Mar 25 06:00 UTC] No.43385566[source]▶

>>43385221 #

AI rewrote to avoid undefined behavior:

  int average(int x, int y) {
    long sum = (long)x + y;
    if(sum > INT_MAX || sum < INT_MIN)
        return -1; // or any value that indicates an error/overflow
  
    return (int)(sum / 2);
  }

replies(5): >>43386128 #>>43386231 #>>43386269 #>>43386613 #>>43396071 #

1. throwaway2037 ◴[17 Mar 25 08:29 UTC] No.43386269[source]▶

>>43385566 #

I don't know why this answer was downvoted. It adds valuable information to this discussion. Yes, I know that someone already pointed out that sizeof(int) is not guaranteed on all platforms to be smaller than sizeof(long). Meh. Just change the type to long long, and it works well.

replies(4): >>43386284 #>>43386391 #>>43389387 #>>43396082 #

2. gf000 ◴[17 Mar 25 08:33 UTC] No.43386284[source]▶

>>43386269 (TP) #

It literally returns a valid output value as an error.

replies(1): >>43389527 #

3. josefx ◴[17 Mar 25 08:53 UTC] No.43386391[source]▶

>>43386269 (TP) #

> Meh. Just change the type to long long, and it works well.

C libraries tend to support a lot of exotic platforms. zlib for example supports Unicos, where int, long int and long long int are all 64 bits large.

4. NobodyNada ◴[17 Mar 25 15:09 UTC] No.43389387[source]▶

>>43386269 (TP) #

Copypasting a comment into an LLM, and then copypasting its response back is not a useful contribution to a discussion, especially without even checking to be sure it got the answer right. If I wanted to know what an LLM had to say, I can go ask it myself; I'm on HN because I want to know what people have to say.

replies(1): >>43389546 #

5. oneshtein ◴[17 Mar 25 15:22 UTC] No.43389527[source]▶

>>43386284 #

An error value is valid output in both cases.

replies(1): >>43393545 #

6. ◴[17 Mar 25 15:25 UTC] No.43389546[source]▶

>>43389387 #

7. MaxBarraclough ◴[17 Mar 25 22:50 UTC] No.43393545{3}[source]▶

>>43389527 #

The code is unarguably wrong.

average(INT_MAX,INTMAX) should return INT_MAX, but it will get that wrong and return -1.

average(0,-2) should not return a special error-code value, but this code will do just that, making -1 an ambiguous output value.

Even its comment is wrong. We can see from the signature of the function that there can be no value that indicates an error, as every possible value of int may be a legitimate output value.

It's possible to implement this function in a portable and standard way though, along the lines of [0].

[0] https://stackoverflow.com/a/61711253/ (Disclosure: this is my code.)

replies(1): >>43396843 #

8. umanwizard ◴[18 Mar 25 05:33 UTC] No.43396082[source]▶

>>43386269 (TP) #

I always downvote all AI-generated content regardless of whether it’s right or wrong, because I would like to discourage people from posting it.

9. MaxBarraclough ◴[18 Mar 25 08:19 UTC] No.43396843{4}[source]▶

>>43393545 #

Too late for me to edit: as josefx pointed out, it also fails to properly address the undefined behavior. The sums INT_MAX + INT_MAX and INT_MIN + INT_MIN may still overflow despite being done using the long type.

That won't occur on an 'LP64' platform, [0] but we should aim for proper portability and conformance to the C language standard.

[0] https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_m...

↑