Most active commenters
  • steveklabnik(9)
  • uecker(7)
  • timschmidt(7)
  • umanwizard(7)
  • pjmlp(7)
  • throwaway2037(7)
  • NobodyNada(6)
  • cmrdporcupine(5)
  • pclmulqdq(5)
  • j-krieger(5)

←back to thread

Zlib-rs is faster than C

(trifectatech.org)
341 points dochtman | 193 comments | | HN request time: 0.008s | source | bottom
Show context
YZF ◴[] No.43381858[source]
I found out I already know Rust:

        unsafe {
            let x_tmp0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x10);
            xmm_crc0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x01);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, x_tmp0);
            xmm_crc1 = _mm_xor_si128(xmm_crc1, xmm_crc0);
Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?

Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?

replies(30): >>43381895 #>>43381907 #>>43381922 #>>43381925 #>>43381928 #>>43381931 #>>43381934 #>>43381952 #>>43381971 #>>43381985 #>>43382004 #>>43382028 #>>43382110 #>>43382166 #>>43382503 #>>43382805 #>>43382836 #>>43383033 #>>43383096 #>>43383480 #>>43384867 #>>43385039 #>>43385521 #>>43385577 #>>43386151 #>>43386256 #>>43386389 #>>43387043 #>>43388529 #>>43392530 #
1. Aurornis ◴[] No.43381931[source]
Using unsafe blocks in Rust is confusing when you first see it. The idea is that you have to opt-out of compiler safety guarantees for specific sections of code, but they’re clearly marked by the unsafe block.

In good practice it’s used judiciously in a codebase where it makes sense. Those sections receive extra attention and analysis by the developers.

Of course you can find sloppy codebases where people reach for unsafe as a way to get around Rust instead of writing code the Rust way, but that’s not the intent.

You can also find die-hard Rust users who think unsafe should never be used and make a point to avoid libraries that use it, but that’s excessive.

replies(10): >>43381986 #>>43382095 #>>43382102 #>>43382323 #>>43385098 #>>43385651 #>>43386071 #>>43386189 #>>43386569 #>>43392018 #
2. timschmidt ◴[] No.43381986[source]
Unsafe is a very distinct code smell. Like the hydrogen sulfide added to natural gas to allow folks to smell a gas leak.

If you smell it when you're not working on the gas lines, that's a signal.

replies(6): >>43382188 #>>43382239 #>>43384810 #>>43385163 #>>43385670 #>>43386705 #
3. api ◴[] No.43382095[source]
The idea is that you can trivially search the code base for "unsafe" and closely examine all unsafe code, and unless you are doing really low-level stuff there should not be much of it. Higher level code bases should ideally have none.

It tends to be found in drivers, kernels, vector code, and low-level implementations of data structures and allocators and similar things. Not typical application code.

As a general rule it should be avoided unless there's a good reason to do it. But it's there for a reason. It's almost impossible to create a systems language that imposes any kind of rules (like ownership etc.) that covers all possible cases and all possible optimization patterns on all hardware.

replies(2): >>43382120 #>>43382568 #
4. chongli ◴[] No.43382102[source]
Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees? As far as I'm aware, inside the unsafe block you can do whatever you want which means all of the nice memory-safety properties of the language go away.

It's like letting a wet dog (who'd just been swimming in a nearby swamp) run loose inside your hermetically sealed cleanroom.

replies(16): >>43382176 #>>43382305 #>>43382448 #>>43382481 #>>43382485 #>>43382606 #>>43382685 #>>43382739 #>>43383207 #>>43383637 #>>43383811 #>>43384238 #>>43384281 #>>43385190 #>>43385656 #>>43387402 #
5. timschmidt ◴[] No.43382120[source]
To the extent that it's even possible to write bare metal microcontroller firmware in Rust without unsafe, as the embedded hal ecosystem wraps unsafe hardware interfaces in a modular fairly universal safe API.
6. timschmidt ◴[] No.43382176[source]
It seems like you've got it backwards. Even unsafe rust is still more strict than C. Here's what the book has to say (https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html)

"You can take five actions in unsafe Rust that you can’t in safe Rust, which we call unsafe superpowers. Those superpowers include the ability to:

    Dereference a raw pointer
    Call an unsafe function or method
    Access or modify a mutable static variable
    Implement an unsafe trait
    Access fields of a union
It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks: if you use a reference in unsafe code, it will still be checked. The unsafe keyword only gives you access to these five features that are then not checked by the compiler for memory safety. You’ll still get some degree of safety inside of an unsafe block.

In addition, unsafe does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems: the intent is that as the programmer, you’ll ensure the code inside an unsafe block will access memory in a valid way.

People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs."

replies(6): >>43382290 #>>43382353 #>>43382376 #>>43383159 #>>43383265 #>>43386165 #
7. cmrdporcupine ◴[] No.43382188[source]
Look, no. Just go read the unsafe block in question. It's just SIMD intrinsics. No memory access. No pointers. It's unsafe in name only.

No need to get all moral about it.

replies(3): >>43382234 #>>43382266 #>>43382480 #
8. kccqzy ◴[] No.43382234{3}[source]
By your line of reasoning, SIMD intrinsics functions should not be marked as unsafe in the first place. Then why are they marked as unsafe?
replies(4): >>43382276 #>>43382451 #>>43384972 #>>43385883 #
9. mrob ◴[] No.43382239[source]
There's no standard recipe for natural gas odorant, but it's typically a mixture of various organosulfur compounds, not hydrogen sulfide. See:

https://en.wikipedia.org/wiki/Odorizer#Natural_gas_odorizers

replies(2): >>43382271 #>>43386386 #
10. timschmidt ◴[] No.43382266{3}[source]
I don't read any moralizing in my previous comment. And it seems to mirror the relevant section in the book:

"People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs."

I hope the SIMD intrinsics make it to stable soon so folks can ditch unnecessary unsafes if that's the only issue.

11. timschmidt ◴[] No.43382271{3}[source]
TIL!
12. cmrdporcupine ◴[] No.43382276{4}[source]
There's no standardization of simd in Rust yet, they've been sitting in nightly unstable for years:

https://doc.rust-lang.org/std/intrinsics/simd/index.html

So I suspect it's a matter of two things:

1. You're calling out to what's basically assembly, so buyer beware. This is basically FFI into C/asm.

2. There's no guarantee on what comes out of those 128-bit vectors after to follow any sanity or expectations, so... buyer beware. Same reason std::mem::transmute is marked unsafe.

It's really the weakest form of unsafe.

Still entirely within the bounds of a sane person to reason about.

replies(3): >>43382389 #>>43382440 #>>43385419 #
13. pclmulqdq ◴[] No.43382290{3}[source]
The way I have heard it described that I think is a bit more succinct is "unsafe admits undefined behavior as though it was safe."
14. CooCooCaCha ◴[] No.43382305[source]
I wouldn’t go that far. Bevy for example, uses unsafe internally but is VERY strict about it, and every use of unsafe requires a comment explaining why the code is safe.

In other words, unsafe works if you use it carefully and keep it contained.

replies(1): >>43382540 #
15. colonwqbang ◴[] No.43382323[source]
Can’t rust do safe simd? This is just vectorised multiplication and xor, but it gets labelled as unsafe. I imagine most code that wants to be fast would use simd to some extent.
replies(1): >>43382443 #
16. Someone ◴[] No.43382353{3}[source]
But “Dereference a raw pointer”, in combination with the ability to create raw pointers pointing to arbitrary memory addresses (that, you can do even in safe rust) allows you to write arbitrary memory from unsafe rust.

So, in theory, unsafe rust opens the floodgates. In practice, though, you can use small fragments of unsafe code that programmers can fairly easily check to be safe.

Then, once you’ve convinced yourself that those fragments are safe, you can be assured that your whole program is safe (using ‘safe’ in the rust sense, of course)

So, there may be some small islands of unsafe code that require extra attention from the programmer, but that should be just a tiny fraction of all lines, and you should be able to verify those islands in isolation.

replies(1): >>43382404 #
17. uecker ◴[] No.43382376{3}[source]
This description is still misleading. The preconditions for the correctness of an unsafe block can very much depend on the correctness of the code outside and it is easy to find Rust bugs where exactly this was the cause. This is very similar where often C out of bounds accesses are caused by some logic error elsewhere. Also an unsafe block has to maintain all the invariants the safe Rust part needs to maintain correctness.
replies(4): >>43382514 #>>43382566 #>>43382585 #>>43383088 #
18. pclmulqdq ◴[] No.43382389{5}[source]
> they've been sitting in nightly unstable for years

So many very useful features of Rust and its core library spend years in "nightly" because the maintainers of those features don't have the discipline to see them through.

replies(3): >>43382419 #>>43383440 #>>43385204 #
19. steveklabnik ◴[] No.43382404{4}[source]
> allows you

This is where the rubber hits the road. Rust does not allow you to do this, in the sense that this is possibly undefined behavior. That "possibly" is why the compiler allows you to write this code, because by saying "unsafe", you are promising that this specific arbitrary address is legal for you to write to. But that doesn't mean that it's always legal to do so.

replies(1): >>43382457 #
20. cmrdporcupine ◴[] No.43382419{6}[source]
simd and allocator_api are the two that irritate me enough to consider a different language for future systems dev projects.

I don't have the personality or time to wade into committee type work, so I have no idea what it would take to get those two across the finish line, but the allocator one in particular makes me question Rust for lower level applications. I think it's just not going to happen.

If Zig had proper ADTs and something equivalent to borrow checker, I'd be inclined to poke at it more.

replies(1): >>43385115 #
21. steveklabnik ◴[] No.43382440{5}[source]
> There's no standardization of simd in Rust yet

Of safe SIMD, but some stuff in core::arch is stabilized. Here's the first bit called in the example of the OP: https://doc.rust-lang.org/core/arch/x86/fn._mm_clmulepi64_si...

22. steveklabnik ◴[] No.43382443[source]
It's still nightly-only.
23. SkiFire13 ◴[] No.43382448[source]
You lose the nice guarantees inside the `unsafe` block, but the point is to write a sound and safe interface over it, that is an API that cannot lead to UB no matter how other safe code calls it. This is basically the encapsulation concept, but for safety.

To continue the analogy of the dog, you let the dog get wet (=you use unsafe), but you put a cleaning room (=the sound and safe API) before your sealed room (=the safe code world)

24. CryZe ◴[] No.43382451{4}[source]
They are in the process of marking them safe, which is enabled through the target_feature 1.1 RFC.

In fact, it has already been merged two weeks ago: https://github.com/rust-lang/stdarch/pull/1714

The change is already visible on nightly: https://doc.rust-lang.org/nightly/core/arch/x86/fn._mm_xor_s...

Compared to stable: https://doc.rust-lang.org/core/arch/x86/fn._mm_xor_si128.htm...

So this should be stable in 1.87 on May 15 (Rust's 10 year anniversary since 1.0)

25. timschmidt ◴[] No.43382457{5}[source]
The compiler won't allow you to compile such code without the unsafe. The unsafe is *you* promising the compiler that *you* have checked to ensure that the address will always be legal. So that the compiler will allow you to compile the code.
replies(1): >>43382475 #
26. steveklabnik ◴[] No.43382475{6}[source]
Right, I'm saying "allow" has two different connotations, and only one of them, the one that you're talking about, applies.
replies(1): >>43382596 #
27. SkiFire13 ◴[] No.43382480{3}[source]
SIMD intrinsics are unsafe because they are available only under some CPU features.
28. timeon ◴[] No.43382481[source]
> unsafe even a single time, you lose all of Rust's nice guarantees

Not sure why would one resulted in all. One of Rust's advantages is the clear boundary between safe/unsafe.

replies(1): >>43387667 #
29. wongarsu ◴[] No.43382485[source]
If your unsafe code violates invariants it was supposed to uphold, that can wreck safety properties the compiler was trying to uphold elsewhere. If you can achieve something without unsafe you definitely should (safe, portable simd is available in rust nightly, but it isn't stable yet).

At the same time, unsafe doesn't just turn off all compiler checks, it just gives you tools to go around them, as well as tools that happen to go around them because of the way they work. Rust unsafe is this weird mix of being safer than pure C, but harder to grasp; with lots of nuanced invariants you have to uphold. If you want to ensure your code still has all the nice properties the compiler guarantees (which go way beyond memory safety) you would have to carefully examine every unsafe block. Which few people do, but you generally still end up with a better status quo than C/C++ where any code can in principle break properties other code was trying to uphold.

30. iknowstuff ◴[] No.43382514{4}[source]
No. Correctness of code outside unsafe depends on correctness inside those blocks, not the other way around
replies(1): >>43382600 #
31. tonyhart7 ◴[] No.43382540{3}[source]
right, the point is raising awareness and assumption its not 100 and 0 problem
32. dwattttt ◴[] No.43382566{4}[source]
It's true, but I think it's only fair if you hold Rust to this analysis, other languages should too; the scrutiny you're implying you need in an unsafe Rust block needs to be applied to all C code, because all C code could depend on code anywhere else for its safety characteristics.

In practice (in both languages) you check what the actual unsafe code does (or "all" code in C's case), note code that depends on external actors for safety (it's not all C code, nor is it all unsafe Rust blocks), and check their callers (and callers callers, etc).

replies(1): >>43382684 #
33. formerly_proven ◴[] No.43382568[source]
My understanding from Aria Beingessner's and some other writings is that unsafe{} rust is significantly harder to get right in "non-trivial cases" than C, because the semantics are more complex and less specified.
replies(2): >>43382970 #>>43383545 #
34. lambda ◴[] No.43382585{4}[source]
So, it's true that unsafe code can depend on preconditions that need to be upheld by safe code.

But using ordinary module encapsulation and private fields, you can scope the code that needs to uphold those preconditions to a particular module.

So the "trusted computing base" for the unsafe code can still be scoped and limited, allowing you to reduce the amount of code you need to audit and be particularly careful about for upholding safety guarantees.

Basically, when writing unsafe code, the actual unsafe operations are scoped to only the unsafe blocks, and they have preconditions that you need to scope to a particular module boundary to ensure that there's a limited amount of code that needs to be audited to ensure it upholds all of the safety invariants.

Ralf Jung has written a number of good papers and blog posts on this topic.

replies(1): >>43382721 #
35. timschmidt ◴[] No.43382596{7}[source]
I gotcha. I misread and misunderstood. Yes, we agree.
36. sunshowers ◴[] No.43382606[source]
What language is the JVM written in?

All safe code in existence running on von Neumann architectures is built on a foundation of unsafe code. The goal of all memory-safe languages is to provide safe abstractions on top of an unsafe core.

replies(3): >>43385347 #>>43385422 #>>43386156 #
37. uecker ◴[] No.43382684{5}[source]
What is true is that there are more operations in C which can cause undefined behavior and those are more densely distributed over the C code, making it harder to screen for undefined behavior. This is true and Rust certainly has an advantage, but it not nearly as big of an advantage as the "Rust is safe" (please do not look at all the unsafe blocks we need to make it also fast!) and "all C is unsafe" story wants you to believe.
replies(4): >>43382883 #>>43383190 #>>43383793 #>>43385047 #
38. janice1999 ◴[] No.43382685[source]
Claiming unsafe invalidates "all of the nice memory-safety properties" is like saying having windows in your house does away with all the structural integrity of your walls.

There's even unsafe usage in the standard library and it's used a lot in embedded libraries.

replies(1): >>43383773 #
39. uecker ◴[] No.43382721{5}[source]
And you think one can not modularize C code and encapsulate critical buffer operations in much safer APIs? One can, the problem is that a lot of legacy C code was not written this way. Also lot of newly written C code is not written this way, but the reason is often that people cut corners when they need to get things done with limited time and resources. The same you will see with Rust.
replies(4): >>43383131 #>>43383951 #>>43384869 #>>43386840 #
40. vlovich123 ◴[] No.43382739[source]
You only lose those guarantees if and only if the code within the unsafe block violates the rules of the Rust language.

Normally in safe code you can’t violate the language rules because the compiler enforces various rules. In unsafe mode, you can do several things the compiler would normally prevent you from doing (e.g. dereferencing a naked pointer). If you uphold all the preconditions of the language, safety is preserved.

What’s unfortunate is that the rules you are required to uphold can be more complex than you might anticipate if you’re trying to use unsafe to write C-like code. What’s fortunate is that you rarely need to do this in normal code and in SIMD which is what the snippet is representing there’s not much danger of violating the rules.

41. iknowstuff ◴[] No.43382849{6}[source]
tf are you talking about
replies(2): >>43382906 #>>43382911 #
42. dwattttt ◴[] No.43382883{6}[source]
The places where undefined behaviour can occur are also limited in scope; you insist that that part isn't true, because operations outside those unsafe blocks can impact their safety.

That's only true at the same level of scrutiny as "all C operations can cause undefined behaviour, regardless of what they are", which I find similarly shallow.

43. steveklabnik ◴[] No.43382906{7}[source]
They are (rudely) talking about https://news.ycombinator.com/item?id=43382369
44. dwattttt ◴[] No.43382911{7}[source]
In a more helpful framing: safe Rust code doesn't need to worry about its own correctness, it just is.

Unsafe code can be incorrect (or unsound), and needs to be careful about it. Part of being careful is that safe code can call the unsafe code in a way that triggers that unsoundness; in that way, safe code can cause undefined behaviour in unsafe code.

It's not always the case that this is possible; there are unsafe blocks that don't need to depend on safe code for its correctness.

45. dwattttt ◴[] No.43382970{3}[source]
It's hard to compare. Rust has stricter requirements than C, but looser requirements don't mean easier: ever bit shifted by a variable amount? Hope you never relied on shifting "entirely" out of a variable zeroing it.
46. gf000 ◴[] No.43383088{4}[source]
This is technically correct, but a bit pedantic.

Sure, you can technically just write your own vulnerability for your own program and inject it at an unsafe and see the whole world crumble... but the exact same is true for any form of FFI calls in any language. Is Java memory safe? Yeah, just because I can grab a random pointer and technically break anything I want won't change that.

The fact that a memory vulnerability error may either appear at no place at all OR at the couple hundred lines of code thorough the whole project is a night and day difference.

47. gf000 ◴[] No.43383131{6}[source]
Even innocent looking C code can be chock-full of UBs that can invalidate your "local reasoning" capabilities. So, not even close.
replies(1): >>43383379 #
48. onnimonni ◴[] No.43383159{3}[source]
Would someone with more experience be able to explain to me why can't these operations be "safe"? What is blocking rust from producing the same machine code in a "safe" way?
replies(4): >>43383264 #>>43383268 #>>43383285 #>>43383292 #
49. gf000 ◴[] No.43383190{6}[source]
Rust is plenty fast, in fact there are countless examples of safe rust that will trivially beat out C in performance due to no aliasing, enabling better vectorization among others. Let alone being simply a more expressive language and allowing writing better optimizations (e.g. small strings, vs the absolutely laughable c-strings that perform terribly, but also you can actually get away with sharing more stuff in memory vs doing defensive copies everywhere because it is safe to do so, etc)

And there is not many things we have statistics on in CS, but memory vulnerabilities being absolutely everywhere in unsafe languages, and Rust cleaning up the absolute majority of them even when only the new parts are written in Rust are some of the few we do know, based on actual, real life projects at Google/Microsoft among others.

A memory safe low-level language is as novel as it gets. Rust is absolutely not just hype, it actually delivers and you might want to get on with the times.

replies(1): >>43385295 #
50. pdimitar ◴[] No.43383207[source]
Where did you even get that weird extreme take from?

O_o

51. vlovich123 ◴[] No.43383264{4}[source]
Those specific functions are compiler builtin vector intrinsics. The main reason is that they can easily read past ends of arrays and have type safety and aliasing issues.

By the way, the rust compiler does generate such code because under the hood LLVM runs an autovectorizer when you turn on optimizations. However, for the autovectorizer to do a good job you have to write code in a very special way and you have no way of controlling whether or not it kicked in and once it did that it did a good job.

There’s work on creating safe abstractions (that also transparently scale to the appropriate vector instruction), but progress on that has felt slow to me personally and it’s not available outside nightly currently.

replies(1): >>43385330 #
52. rybosome ◴[] No.43383265{3}[source]
I believe the post you are replying to was referring to the fact that you could take actions in that unsafe block that would compromise the guarantees of rust; eg you could do something silly, leave the unsafe block, then hit an “impossible” condition later in the program.

A simple example might be modifying a const value deep down in some class, where it only becomes apparent later in the program’s execution. Hence their analogy of the wet dog in a clean room - whatever beliefs you have about the structure of memory in your entire program, and guaranteed by the compiler, could have been undone by a rogue unsafe.

replies(1): >>43396097 #
53. ◴[] No.43383268{4}[source]
54. NobodyNada ◴[] No.43383285{4}[source]
Rust's raw pointers are more-or-less equivalent to C pointers, with many of the same types of potential problems like dangling pointers or out-of-bounds access. Rust's references are the "safe" version of doing pointer operations; raw pointers exist so that you can express patterns that the borrow checker can't prove are sound.

Rust encourages using unsafe to "teach" the language new design patterns and data structures; and uses this heavily in its standard library. For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.

Think of unsafe not as "this code is unsafe", but as "I've proven this code to be safe, and the borrow checker can rely on it to prove the safety of the rest of my program."

replies(1): >>43385326 #
55. adgjlsfhk1 ◴[] No.43383292{4}[source]
often the unsafe code is at the edges of the type system. e.g. sometimes the proof of safety is that someone read the source code of the c library that you are calling out to. it's not useful to think of machine code as safe or unsafe. safety often refers to whether the types of your data match the lifetime dataflow.
56. wavemode ◴[] No.43383379{7}[source]
Care to share an example?
replies(3): >>43383437 #>>43383963 #>>43385097 #
57. capitainenemo ◴[] No.43383437{8}[source]
sorting floats with NaN ? almost anything involving threading and mutation where people either don't realise how important locks are, or don't realise their code has suddenly been threaded?
58. NobodyNada ◴[] No.43383440{6}[source]
Before I started working with Rust, I spent a lot of time using Swift for systems-y/server-side code, outside of the Apple ecosystem. There is a lot I like about that language, but one of the biggest factors that drove me away was just how fast the Apple team was to add more and more compiler-magic features without considering whether they were really the best possible design. (One example: adding compiler-magic derived implementations of specific protocols instead of an extensible macro system like Rust has.) When these concerns were raised on the mailing lists, the response from leadership was "yes, something like that would be better in the long run, but we want to ship this now." Or even in one case, "yes, that tweak to the design would be better, but we already showed off the old design at the WWDC keynote and we don't want to break code we put in a keynote slide."

When I started working in Rust, I'd want some feature or function, look it up, and find it was unstable, sometimes for years. This was frustrating at first, but then I'd go read the GitHub issue thread and find that there was some design or implementation concern that needed to be overcome, and that people were actively working on it and unwilling to stabilize the feature until they were sure it was the best possible design. And the result of that is that features that do get stabilized are well thought out, generalize, and compose well with everything else in the language.

Yes, I really want things like portable SIMD, allocators, generators, or Iterator::intersperse. But programming languages are the one place I really do want perfect to be the enemy of good. I'd rather it take 5+ years to stabilize features than for us to end up with another Swift or C++.

replies(2): >>43383716 #>>43384703 #
59. NobodyNada ◴[] No.43383545{3}[source]
This is definitely true right now, but I don't think it will always be the case.

Unsafe Rust is currently extremely underspecified and underdocumented, but it's designed to be far more specifiable than C. For example: aliasing rules. When and how you're allowed to alias references in unsafe code is not at all documented and under much active discussion; whereas in C pointer aliasing rules are well defined but also completely insane (casting pointers to a different type in order to reinterpret the bytes of an object is often UB even in completely innocuous cases).

Once Rust's memory model is fully specified and written down, unsafe Rust is trying to go for something much simpler, more teachable, and with less footguns than C.

Huge props to Ralf Jung and the opsem team who are working on answering these questions & creating a formal specification: https://github.com/rust-lang/unsafe-code-guidelines/issues

60. xboxnolifes ◴[] No.43383637[source]
If you have 1 unsafe block, and you have a memory related crash/issue, where in your Rust code do you think the root cause is located?

This isn't a wet dog in a cleanroom. This is cleanroom complex that has a very small outhouse that is labeled as dangerous.

61. grandiego ◴[] No.43383716{7}[source]
> the response from leadership was "yes, something like that would be better in the long run, but we want to ship this now."

Sounds like the Rust's async story.

replies(2): >>43383751 #>>43384178 #
62. steveklabnik ◴[] No.43383751{8}[source]
Async went through years of work before being stabilized. This isn't true.
63. benjiro ◴[] No.43383773{3}[source]
Where are you more likely get a burglar enter your home? Windows ... Where are you more likely to develop cracks in your walls? Windows ... Where are you more likely to develop leaks? Windows (especially roof windows!)...

Sorry but horrible comparison ;)

If you need to rely on unsafe in a memory-safe language for performance reasons, then there is a issue with the language compiler at that point, that needs to be fixed. Simple as that.

The whole memory-safety is the bread and butter of the language, the moment you start to bypass it for faster memory operations, you can start doing the same in any other language. I mean, your literally bypassing the main selling point of the language. \_00_/

replies(2): >>43383838 #>>43384027 #
64. pdimitar ◴[] No.43383793{6}[source]
You sound pretty biased, gotta tell you. That snark is not helping any argument you think you might be doing -- and you are not doing any; you are kind of just making fun of Rust, which is pretty boring and uninformative for any reader.

From my past experiences with Rust, the team never had to think about data race once, or mutable volatile globals. And we all there suffered from those decades ago with C and sometimes C++ as well.

You like those and don't want to migrate? More power to ya! But badmouthing Rust with what seem fairly uninformed comments is just low. Inform yourself first.

65. LoganDark ◴[] No.43383811[source]
> Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees?

No, not even close. You only lose Rust's safety guarantees when your unsafe code causes Undefined Behavior. Unsafe code that can be made to cause UB from Safe Rust is typically called unsound, and unsafe code that cannot be made to cause UB from Safe Rust is called sound. As long as your unsafe code is sound, then it does not break any of Rust's guarantees.

For example, unsafe code can still use slices or references provided by Safe Rust, because those are always guaranteed to be valid, even in an unsafe block. However, if from inside that unsafe block you then go on to manufacture an invalid slice or reference using unsafe functions, that is UB and you lose Rust's safety guarantees because of the UB.

66. pdimitar ◴[] No.43383838{4}[source]
> If you need to rely on unsafe in a memory-safe language for performance reasons, then there is a issue with the language compiler at that point, that needs to be fixed. Simple as that.

It actually means "Rust needs to interface with many other systems that are not as stringent as it". Your interpretation has nothing to do with what's actually going on and I am surprised you misinterpreted the situation as hugely as you did.

...And even if everything was written in Rust, `unsafe` would still be needed because the lower you get [to the kernel] you get more and more non-determinism at places.

This "all or nothing" attitude is boring and tiring. We all wish things were super simple, black and white, and all-or-nothing. They are not.

67. nicoburns ◴[] No.43383951{6}[source]
You're a lot more limited more limited to the kinds of APIs you can safely encapsulate in C. For example, you can't safely encapsulate an interface that shares memory between the library and the caller in C. So you're forced into either:

- Exposing an unsafe API and relying on the caller to manually uphold invariants

- Doing things like defensive copying at a performance cost

In many cases Rust gives you the best of both worlds: sharing memory liberally while still having the compiler enforce correctness.

replies(1): >>43392262 #
68. masfuerte ◴[] No.43383963{8}[source]

   int average(int x, int y) {
       return (x+y)/2;
   }
replies(3): >>43385221 #>>43392246 #>>43445900 #
69. unrealhoang ◴[] No.43384027{4}[source]
So static typing is stupid because at the end of the line your program must interface with stream of untyped bits (i/o)?

Once you can internalize that you could unlock the power of encapsulation.

70. NobodyNada ◴[] No.43384178{8}[source]
Rust's async model was shipped as an MVP, not in the sense of "this is a bad design and we just want to ship it"; but rather, "we know this is the first step of the eventual design we want, so we can commit to stabilizing these parts of it now while we work on the rest." There's ongoing work to bring together the rest of the pieces and ergonomics on top of that foundational model; async closures & trait methods were recently stabilized, and work towards things like pin ergonomics & simplifying cheap clones like Rc are underway.

Rust uses this strategy of minimal/incremental stabilization quite often (see also: const generics, impl Trait); the difference between this and what drove me away from Swift is that MVPs aren't shipped unless it's clear that the design choices being made now will still be the right choices when the rest of the feature is ready.

replies(1): >>43384296 #
71. EnnEmmEss ◴[] No.43384238[source]
Jason Ordendorff's talk [1] was probably the first time I truly grokked the concept of unsafe in Rust. The core idea behind unsafe in Rust is not to provide an escape from the guarantees provided by rust. It's to isolate the places where you have no choice but to break the guarantees and rigorously code/test the boundaries there so that anything wrapping the unsafe code can still provide the guarantees.

[1]: https://www.youtube.com/watch?v=rTo2u13lVcQ

72. andyferris ◴[] No.43384281[source]
Rust isn't the only memory-safe language.

As soon as you start playing with FFI and raw pointers in Python, NodeJS, Julia, R, C#, etc you can easily loose the nice memory-safety properties of those languages - create undefined behavior, segfaults, etc. I'd say Rust is a lot nicer for checking unsafe correctness than other memory-safe languages, and also makes it easier to dip down to systems-level programming, yet it seems to get a lot of hate for these features.

replies(1): >>43386111 #
73. cmrdporcupine ◴[] No.43384296{9}[source]
IMO shipping async without a standardized API for basic common async facilities (like thread spawning, file/network I/O) was a mistake and basically means that tokio has eaten the whole async side of the language.

Why define runtime independence as a goal, but then make it impossible to write runtime agnostic crates?

(Well, there's the "agnostic" crate at least now)

replies(1): >>43384821 #
74. pclmulqdq ◴[] No.43384703{7}[source]
My personal opinion is that if you want to contribute a language feature, shit or get off the pot. Leaving around a half-baked solution actually raises the required effort for someone who isn't you to add that feature (or an equivalent) because they now have to either (1) ramp up on the spaghetti you wrote or (2) overcome the barrier of explaining why your thing isn't good enough. Neither of those two things are fun (which is important since writing language features is volunteer work) and those things come in the place of doing what is actually fun, which is writing the relevant code.

The fact that the Rust maintainers allow people to put in half-baked features before they are fully designed is the biggest cultural failing of the language, IMO.

replies(1): >>43384769 #
75. dralley ◴[] No.43384769{8}[source]
>The fact that the Rust maintainers allow people to put in half-baked features before they are fully designed is the biggest cultural failing of the language, IMO.

In nightly?

Hard disagree. Letting people try things out in the real world is how you avoid half-baked features. Easy availability of nightly compilers with unstable features allows way more people to get involved in the pre-stabilization polishing phase of things and raise practical concerns instead of theoretical ones.

C++ takes the approach of writing and nitpicking whitepapers for years before any implementations are ready and it's hard to see how that has led to better outcomes relatively speaking.

replies(1): >>43384818 #
76. throwaway150 ◴[] No.43384810[source]
> Like the hydrogen sulfide added to natural gas to allow folks to smell a gas leak.

I am 100% sure that the smell they add to natural gas does not smell like rotten eggs.

replies(2): >>43385005 #>>43385686 #
77. pclmulqdq ◴[] No.43384818{9}[source]
Yeah, we're going to have to agree to disagree on the C++ flow (really the flow for any language that has a written standard) being better. That flow is usually:

1. Big library/compiler does a thing, and people really like it

2. Other compilers and libraries copy that thing, sometimes putting their own spin on it

3. All the kinks get worked out and they write a white paper

4. Eventually the thing becomes standard

That way, everything in the standard library is something that is fully-thought-out and feature-complete. It also gives much more room for competing implementations to be built and considered before someone stakes out a spot in the standard library for their thing.

replies(2): >>43384839 #>>43386079 #
78. dralley ◴[] No.43384821{10}[source]
>IMO shipping async without a standardized API for basic common async facilities (like thread spawning, file/network I/O) was a mistake and basically means that tokio has eaten the whole async side of the language.

I would argue that it's the opposite of a mistake. If you standardize everything before the ecosystem gets a chance to play with it, you risk making mistakes that you have to live with in perpetuity.

replies(1): >>43385278 #
79. dralley ◴[] No.43384839{10}[source]
>That way, everything in the standard library is something that is fully-thought-out and feature-complete

Are C++ features really that much better thought out? Modules were "standardized" half a decade ago, but the list of problems with actually using them in practice is still pretty damn long to the point where adoption is basically non-existent.

I'm not going to pretend to be nearly as knowledgeable about C++ as Rust, but it seems like most new C++ features I hear about are a bit janky or don't actually fit that well with the rest of the language. Something that tends to happen when designing things in an ivory tower without testing them in practice.

replies(1): >>43384882 #
80. lambda ◴[] No.43384869{6}[source]
There is no distinction between safe and unsafe code in C, so it's not possible to make that same distinction that you can in Rust.

And even if you try to provide some kind of safer abstraction, you're limited by the much more primitive type system, that can't distinguish between owned types, unique borrows, and shared borrows, nor can it distinguish thread safety properties.

So you're left to convention and documentation for that kind of information, but nothing checking that you're getting it right, making it easy to make mistakes. And even if you get it right at first, a refactor could change your invariants, and without a type system enforcing them, you never know until someone comes along with a fuzzer and figures out that they can pwn you

replies(1): >>43392234 #
81. pclmulqdq ◴[] No.43384882{11}[source]
They absolutely are. The reason many features are stupid and janky is because the language and its ecosystem has had almost 40 more years to collect cruft.

The fundamental problem with modules is that build systems for C++ have different abstractions and boundaries. C++ modules are like Rust async - something that just doesn't fit well with the language/system and got hammered in anyway.

The reason it seems like they come from nowhere is probably because you don't know where they come from. Most things go through boost, folly, absl, clang, or GCC (or are vendor-specific features) before going to std.

That being said, it's not just C++ that has this flow for adding features to the language. Almost every other major language that is not Rust has an authoritative specification.

replies(2): >>43384950 #>>43386095 #
82. dralley ◴[] No.43384950{12}[source]
What's a Rust feature that you think suffered from their process in a way that C++ would not have?
83. thrance ◴[] No.43384972{4}[source]
For now the caller has to ensure proper alignment of SMID lines. But in the future a safe API will be made available, once the kinks are ironed out. You can already use it in fact, by enabling a specific compiler feature [1].

[1] https://doc.rust-lang.org/std/simd/index.html

replies(1): >>43385024 #
84. beacon294 ◴[] No.43385005{3}[source]
They add mercaptan which is like 1000x the rotten egg smell of H2S.
replies(1): >>43387099 #
85. anonymoushn ◴[] No.43385024{5}[source]
there are no loads in the above unsafe block, in practice loadu is just as fast as load, and even if you manually use the aligned load or store, you get a crash. it's silly to say that crashes are unsafe.
replies(1): >>43385188 #
86. lambda ◴[] No.43385047{6}[source]
What Rust provides is a way to build safe abstractions over unsafe code.

Rust's type system (including ownership and borrowing, Sync/Send, etc), along with it's privacy features (allowing types to have private fields that can only be accessed by code in the module that defined them) allows you to create fully safe interfaces around code that uses unsafe; there is provably no combination of uses of the interface which lead to undefined behavior.

Now, yeah, it's possible to also use unsafe in Rust just for applying a local optimisation. And that has fewer benefits than a fully encapsulated safe interface, though is still easier to audit for potential UB than C.

So you're right that it's on a continuum, but the distinction between safe and unsafe code means you can more easily find the specific places where UB could occur, and the encapsulation and type system makes it possible to create safe abstractions over unsafe code.

87. pests ◴[] No.43385097{8}[source]
https://www.ioccc.org/years.html
88. rendaw ◴[] No.43385098[source]
While everything you say is true, your reply (and most of its siblings!) entirely misses GP's point.

All languages at some point interface with syscalls or low level assembly that can be done wrong, but one of Rust's selling points is a safe wrapping of low-level interactions. Like safe heap allocation/deallocation with `Box`, or swapping with `swap`, etc. Except... here.

Why does a library like zlib need to go beyond Rust's safe offerings? Why doesn't rust provide safe versions of the constructs zlib needs?

89. anonymoushn ◴[] No.43385115{7}[source]
generic simd abstractions are of quite limited use. I'm not sure what's objectionable about the thing Rust has shipped (in nightly) for this, which is more or less the same as the stuff Zig has shipped for this (in a pre-1.0 compiler version).
replies(1): >>43389051 #
90. RossBencina ◴[] No.43385163[source]
Hydrogen Sulfide is highly corrosive (big problem in sewers and associated infrastructure) I highly doubt you would choose to introduce it to gas pipelines on purpose.
91. jchw ◴[] No.43385188{6}[source]
Well, there's a category difference between a crash as in a panic and a crash as in a CPU exception. Usually, "safe" programming limits crashes to language-level error handling, which allows you to easily reason about the nature of crashes: if the type system is sound and your program doesn't use unsafe, the only way it should crash is by panic, and panics are recoverable and leave your program in a well-defined state. By the time you get to a signal handler, you're too late. Admittedly, there are some cases where this is less important than others... misaligned load/store wouldn't lead to a potential RCE, but if it can bring down a program it still is a potential DoS vector.

Of course, in practice, even in Rust, it isn't strictly true that programs without unsafe can't crash with fatal runtime errors. There's always stack overflows, which will crash you with a SIGABRT or equivalent operating system error.

replies(2): >>43387323 #>>43387638 #
92. rat87 ◴[] No.43385190[source]
My understanding is that the user who writes an unsafe block in a safe function is responsible for making sure that it doesn't do anything wrong to mess up the safety and that the function isn't lying about exposing a safe interface. I think at one point before rust 1.0 there was even a suggestion to rename it trustme. Of course users can easily mess up but the point is to minimize the use of unsafe so its easier to check and create interfaces that can be used safely
93. RossBencina ◴[] No.43385204{6}[source]
> maintainers of those features don't have the discipline to see them through.

This take makes me sad. There are a lot of reasons why an open source contributor may not see something through. "Lack of discipline" is only one of them. Others that come to mind are: lack of time, lack of resources, lack of capability (i.e good at writing code, but struggles to navigate the social complexities of sheparding a significant code change), clinically impaired ability to "stay the course" and "see things through" (e.g. ADHD), or maybe it was a collaborative effort and some of the parties dropped out for any of the aforementioned reasons.

I don't have a solution, but it does kinda suck that open source contribution processes are so dependent on instigators being the responsible party to seeing a change all the way through the pipeline.

94. throwaway2037 ◴[] No.43385221{9}[source]
I assume you are hinting at 'int' is signed here? And, that signed overflow is UB in C? Real question: Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers? I don't know any. As I understand, two's complement correctly supports overflow for signed arithmetic.

I might be old, but more than 10 years ago, hardly anyone talked about UB in C and C++ programming. In the last 10 years, it is all the rage, but seems to add very little to the conversation. For example, if you program C or C++ with the Win32 API, there are loads of weird UB-ish things that seem to work fine.

replies(3): >>43385280 #>>43385345 #>>43385566 #
95. no_wizard ◴[] No.43385278{11}[source]
Unless you clearly define how and when you’re going to handle removing a standard or updating it to reflect better use cases.

Language designers admittedly should worry about constant breakage but it’s fine to have some churn, and we shouldn’t be so concerned of it that it freezes everything

96. steveklabnik ◴[] No.43385280{10}[source]
> Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers?

This is not how compilers work. Optimization happens based on language semantics, not on what platforms do.

97. throwaway2037 ◴[] No.43385295{7}[source]

    > absolutely laughable c-strings that perform terribly
Not much being said here in 2025. Any good project will quickly switch to a tiny structure that holds char* and strlen. There are plenty of open source libs to help you.
replies(1): >>43386634 #
98. throwaway2037 ◴[] No.43385326{5}[source]
Why does Vec need to have any unsafe code? If you respond "speed"... then I will scratch my chin.

    > For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.
I'm sure you already know this, but you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.
replies(1): >>43385620 #
99. throwaway2037 ◴[] No.43385330{5}[source]

    > However, for the autovectorizer to do a good job you have to write code in a very special way
Can you give an example of this "very special way"?
replies(1): >>43386642 #
100. jandrewrogers ◴[] No.43385345{10}[source]
At least in recent C++ standards, integers are defined as two’s complement. As a practical matter what hardware like that may still exist doesn’t have a modern C++ compiler, rendering it a moot point.

UB in C is often found where different real hardware architectures had incompatible behavior. Rather than biasing the language for or against different architectures they left it to the compiler to figure out how to optimize for the cases where instruction behavior diverge. This is still true on current architectures e.g. shift overflow behavior which is why shift overflow is UB.

101. throwaway2037 ◴[] No.43385347{3}[source]

    > What language is the JVM written in?
I am pretty sure it is C++.

I like your second paragraph. It is well written.

replies(1): >>43386157 #
102. jandrewrogers ◴[] No.43385419{5}[source]
The example here is trivially safe but more general SIMD safety is going to be extremely difficult to analyze for safety, possibly intractable.

For example, it is perfectly legal to dereference a vector pointer that references illegal memory if you mask the illegal addresses. This is a useful trick and common in e.g. idiomatic AVX-512 code. The mask registers are almost always computed at runtime so it would be effectively impossible to determine if a potentially illegal dereference is actually illegal at compile-time.

I suspect we’ll be hand-rolling unsafe SIMD for a long time. The different ISAs are too different, inconsistent, and weird. A compiler that could make this clean and safe is like fusion power, it has always been 10 years away my entire career.

replies(1): >>43385562 #
103. rat87 ◴[] No.43385422{3}[source]
I don't think what something was written in should count. Baring bugs it should still be memory safe. But I believe JVM has ffi and as soon as you use ffi you risk messing up that memory safety.
replies(1): >>43386030 #
104. vlovich123 ◴[] No.43385562{6}[source]
Presumably a bounds check on the mask could be done or a safe variant exposed that does that trick under the hood. But yeah I don’t disagree that it’s “safe SIMD” is unlikely to scratch the itch for various applications but hopefully at least it’ll scratch a lot of them enough that the remaining unsafe is reduced.
replies(1): >>43385608 #
105. oneshtein ◴[] No.43385566{10}[source]
AI rewrote to avoid undefined behavior:

  int average(int x, int y) {
    long sum = (long)x + y;
    if(sum > INT_MAX || sum < INT_MIN)
        return -1; // or any value that indicates an error/overflow
  
    return (int)(sum / 2);
  }
replies(5): >>43386128 #>>43386231 #>>43386269 #>>43386613 #>>43396071 #
106. fooker ◴[] No.43385608{7}[source]
No, a bounds check beats the purpose of simd in these cases
replies(1): >>43390317 #
107. NobodyNada ◴[] No.43385620{6}[source]
Rust doesn't have compiler-magic support for anything like a vector. The language has syntax for fixed-sized arrays on the stack, and it supports references to variable-length slices; but it has no magic for constructing variable-length slices (e.g. C++'s `new[]` operator). In fact, the compiler doesn't really "know" about the heap at all.

Instead, all that functionality is written as Rust code in the standard library, such as Vec. This is what I mean by using unsafe code to "teach" the borrow checker: the language itself doesn't have any notion of growable arrays, so you use unsafe to define its semantics and interface, and now the borrow checker understands growable arrays. The alternative would be to make growable arrays some kind of compiler magic, but that's both harder to implement correctly and not generalizable.

> you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.

That's true and that's a great design pattern in C as well. But there are some crucial differences:

- Rust has no undefined behavior outside of unsafe blocks. This means you only need to audit unsafe blocks (and any invariants they assume) to be sure your program is UB-free. C does not have this property even if you code defensively at interface boundaries.

- In Rust, most of the invariants can be checked at compile time; the need for runtime asserts is less than in C.

- C provides no way to defend against dangling pointers without additional tooling & runtime overhead. For instance, if I write a dynamic vector and get a pointer to the element, there's no way to prevent me from using that pointer after I've freed the vector, or appended an element causing the container to get reallocated elsewhere.

Rust isn't some kind of silver bullet where you feed it C-like code and out comes memory safety. It's also not some kind of high-overhead garbage collected language where you have to write unsafe whenever you care about performance. Rather, Rust's philosophy is to allow you to define fundamental operations out of small encapsulated unsafe building blocks, and its magic is in being able to prove that the composition of these operations is safe, given the soundness of the individual components.

The stdlib provides enough of these building blocks for almost everything you need to do. Unsafe code in library/systems code is rare and used to teach the language of new patterns or data structures that can't be expressed solely in terms of the types exposed by the stdlib. Unsafe in application-level code is virtually never necessary.

108. j-krieger ◴[] No.43385651[source]
This is not really true. You have to uphold those guarantees yourself. With unsafe preconditions, if you don't, the code will still crash loudly (which is better than undefined behaviour).
replies(1): >>43386098 #
109. j-krieger ◴[] No.43385656[source]
> Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees

Inside that block, both yes and no. You have to enforce those nice guarantees yourself. Code that violates it will still crash.

110. branko_d ◴[] No.43385670[source]
Hydrogen sulfide is highly toxic (it's comparable to carbon monoxide). I doubt anyone in their right mind would put it intentionally in a place where it could leak around humans.

But it can occur naturally in natural gas.

replies(2): >>43385731 #>>43386126 #
111. hyperbrainer ◴[] No.43385686{3}[source]
you are lucky to not have smelled metacarpan (which is what is actually put in). Much much worse than H2S
replies(1): >>43387992 #
112. k1t ◴[] No.43385731{3}[source]
I assume GP was referring to mercaptan, or similar. i.e. Something with a distinctive bad smell.

https://en.m.wikipedia.org/wiki/Methanethiol

113. exDM69 ◴[] No.43385883{4}[source]
They are marked as unsafe because there are hundreds and hundreds of intrinsics, some of which do memory access, some have side effects and others are arithmetic only. Someone would have to individually review them and explicitly mark the safe ones.

There was a bug open about it and the rationale was that no one with the expertise (some of these are quite arcane) was stepping up to do it. (edit: other comments in this thread suggest that this effort is now underway and first changes were committed a few weeks ago)

You can do safe SIMD using std::simd but it is nightly only at this point.

114. sunshowers ◴[] No.43386030{4}[source]
Does it help to think of "safe Rust" as a language that's written in "unsafe Rust"? That's basically what it is.
115. pjmlp ◴[] No.43386071[source]
It is also an idea that traces back to the 1960's system languages, that apparently was unknown at Bell Labs.
116. pjmlp ◴[] No.43386079{10}[source]
Unfortunely C++ on the last set of revisions has gotten that sequence wrong, many ideas are now PDF implemented before showing up in any compiler years later.

Fully-thought-out and feature-complete is something that since C++17 has been hardly happening.

117. pjmlp ◴[] No.43386095{12}[source]
Since C++17 that anything hardly goes "through boost, folly, absl, clang, or GCC (or are vendor-specific features) before going to std.".
118. littlestymaar ◴[] No.43386098[source]
With unsafe you get exactly the same kind of semantics as C, if you don't uphold the invariant the unsafe functions expect, you end up with UB exactly like in C.

If you want a clean crash instead on indeterministic behavior, you need to use assert like in C, but it won't save you from compiler optimization removing checks that are deemed useless (again, exactly like in C).

replies(2): >>43386272 #>>43388759 #
119. johnisgood ◴[] No.43386111{3}[source]
Ada is even much more better at checking for correctness. It needs to be talked about more. "Safer than C" has been Ada, people did not know this before they jumped on the Rust bandwagon.
120. littlestymaar ◴[] No.43386126{3}[source]
> Hydrogen sulfide is highly toxic (it's comparable to carbon monoxide)

It's a bad comparison since CO doesn't smell, which is what makes it dangerous, while H2S is detected by our sense of smell at concentrations much lower than the toxic dose (in fact, its biggest dangers comes from the fact that at dangerous concentration it doesn't even smell anything due to our receptors being saturated).

It's not what's being put in natural gas, but it wouldn't be that dangerous if we did.

121. Jaxan ◴[] No.43386128{11}[source]
I’m not convinced that solution is much better. It can be improved to x/2 + y/2 (which still gives the wrong answer if both inputs are odd).
122. pjmlp ◴[] No.43386156{3}[source]
Depends on which JVM you are talking about, some are 100% Java, some are a mix of Java and C, others are a mix of Java and C++, in all cases a bit of Assembly as well.
123. pjmlp ◴[] No.43386157{4}[source]
Depends on which JVM you are talking about, some are 100% Java, some are a mix of Java and C, others are a mix of Java and C++, in all cases a bit of Assembly as well.
replies(1): >>43386246 #
124. ◴[] No.43386165{3}[source]
125. ricardobeat ◴[] No.43386189[source]
Is this a sloppy codebase? I browsed through a few random files, and easily 90% of functions are marked unsafe.
126. josefx ◴[] No.43386231{11}[source]
> long sum = (long)x + y;

There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.

https://www.gnu.org/software/libc/manual/html_node/Range-of-...

> return -1; // or any value that indicates an error/overflow

-1 is a perfectly valid average for various inputs. You could return the larger type to encode an error value that is not a valid output or just output the error and average in two distinct variables.

AI and C seem like a match made in hell.

replies(1): >>43389904 #
127. throwaway2037 ◴[] No.43386246{5}[source]
You are right. I should have been more clear. I am talking about the bog standard one that most people use from Oracle/OpenJDK. A long time back it was called "HotSpot JVM". That one has source code available on GitHub. It is mostly C++ with a little bit of C and assembly.
replies(1): >>43386336 #
128. throwaway2037 ◴[] No.43386269{11}[source]
I don't know why this answer was downvoted. It adds valuable information to this discussion. Yes, I know that someone already pointed out that sizeof(int) is not guaranteed on all platforms to be smaller than sizeof(long). Meh. Just change the type to long long, and it works well.
replies(4): >>43386284 #>>43386391 #>>43389387 #>>43396082 #
129. lenkite ◴[] No.43386272{3}[source]
> With unsafe you get exactly the same kind of semantics as C

People seem to disagree.

Unsafe Rust Is Harder Than C

https://chadaustin.me/2024/10/intrusive-linked-list-in-rust/

https://news.ycombinator.com/item?id=41944121

replies(1): >>43390805 #
130. gf000 ◴[] No.43386284{12}[source]
It literally returns a valid output value as an error.
replies(1): >>43389527 #
131. pjmlp ◴[] No.43386336{6}[source]
Define mostly, https://github.com/openjdk/jdk

- Java 74.1%

- C++ 14.0%

- C 7.9%

- Assembly 2.7%

And those values have been increasing for Java with each OpenJDK release.

replies(1): >>43386648 #
132. rob74 ◴[] No.43386386{3}[source]
TIL also - until today, I thought it was just "mercaptan". Turns out there are actually two variants of that:

> Ethanethiol (EM), commonly known as ethyl mercaptan is used in liquefied petroleum gas (LPG) and resembles odor of leeks, onions, durian, or cooked cabbage

Methanethiol, commonly known as methyl mercaptan, is added to natural gas as an odorant, usually in mixtures containing methane. Its smell is reminiscent of rotten eggs or cabbage.

...but you can still call it "mercaptan" and be ~ correct in most cases.

133. josefx ◴[] No.43386391{12}[source]
> Meh. Just change the type to long long, and it works well.

C libraries tend to support a lot of exotic platforms. zlib for example supports Unicos, where int, long int and long long int are all 64 bits large.

134. immibis ◴[] No.43386569[source]
Clearly marking unsafe code is no good for safety, if you have many marked areas.

Some codebases, you can grep for "unsafe", find no results, and conclude the codebase is safe... if you trust its dependencies.

This is not one of those codebases. This one uses unsafe liberally, which tells you it's about as safe as C.

"unsafe behaviour is clearly marked" seems to be a thought-stopping cliche in the Rust world. What's the point of marking them, if you still have them? If every pointer dereference in C code had to be marked unsafe (or "please" like in Intercal), that wouldn't make C any better.

135. immibis ◴[] No.43386613{11}[source]
We're about to see a huge uptick in bugs worldwide, aren't we?
136. saagarjha ◴[] No.43386634{8}[source]
I take that you consider most major projects written in C to not be "good"?
replies(1): >>43389500 #
137. saagarjha ◴[] No.43386642{6}[source]
For example many autovectorizers get upset if you put control flow in your loop
138. saagarjha ◴[] No.43386648{7}[source]
JDK≠JVM
replies(1): >>43386750 #
139. gigatexal ◴[] No.43386705[source]
Someone mentioned to me that for something as simple as a Linked list you have to use unsafe in rust

Update its how the std lib does it: https://doc.rust-lang.org/src/alloc/collections/linked_list....

replies(5): >>43386891 #>>43387304 #>>43390238 #>>43391048 #>>43392633 #
140. pjmlp ◴[] No.43386750{8}[source]
If you are only talking about libjvm.so you would be right, then again that alone won't do much help for Java developers.
replies(1): >>43421298 #
141. GTP ◴[] No.43386840{6}[source]
Which is just a convoluted way of saying that it is possible to write bugs in any language. Still, it's undeniable that some languages make a better job at helping you avoid certain bugs than others.
142. umanwizard ◴[] No.43386891{3}[source]
No you don’t. You can use the standard linked list that is already included in the standard library.

Coming up with these niche examples of things you need unsafe for in order to discredit rust’s safety guarantees is just not interesting. What fraction of programmer time is spent writing custom linked lists? Surely way less than 1%. In most of the other 99%, Rust is very helpful.

replies(1): >>43388348 #
143. taejo ◴[] No.43387099{4}[source]
Mercaptan is a group of compounds, more than one of which are used as gas odorants, so in some places, gas smells of rotten eggs, similar to H2S, while in others gas doesn't smell like that at all, but a quite distinct smell that's reminiscent garlic and durian.
144. ohmygoodniche ◴[] No.43387304{3}[source]
I love how the most common negative thing I hear about rust is how a really uncommon data structure no one should write by hand and should almost always import can be written using the unsafe rust language feature. Meanwhile rust application s tend to in most cases be considerably faster, more correct and more enjoyable to maintain than other languages. Must be a really awesome technology.
145. gpderetta ◴[] No.43387323{7}[source]
As you point out later, a SIGBRT or a SIGBUS would both be perfectly safe and really no different than a panic. With enough infra you could convert them to panic anyway (but probably not worth the effort).
replies(1): >>43388398 #
146. andrewchambers ◴[] No.43387402[source]
It's more like letting a wet dog who you are watching closely quickly pass from your front door to the shower.
147. thrance ◴[] No.43387638{7}[source]
Also, AFAIK panics are not always recoverable in Rust. You can compile your project with `panic = "abort"`, in which case the program will quit immediately whenever a panic is encountered.
replies(1): >>43388463 #
148. tmtvl ◴[] No.43387667{3}[source]
Is there such a boundary? How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?

The usual retort to these questions is 'well, the standard library uses unsafe code, so everything would need a disclaimer that it uses unsafe code, so that's a useless remark to make', but the basic issue still remains that the only clear boundary is whether a function 'contains' unsafe code, not whether a function 'calls' unsafe code.

If Rust did not have a mechanism to use external code then it would be fine because the only sources of unsafe code would be either the application itself or the standard library so you could just grep for 'unsafe' to find the boundaries.

replies(3): >>43389854 #>>43390196 #>>43396112 #
149. throwaway150 ◴[] No.43387992{4}[source]
I have. It's worse no doubt. But it's not the smell of rotten eggs. My comment was meant to be tongue-in-cheek to correct the mistake of saying "H2S" in the GP comment.
replies(1): >>43390029 #
150. vikramkr ◴[] No.43388348{4}[source]
I think the point is that it's funny that the standard library has to use unsafe to implement a data structure that's like the second data structure you learn in an intro to CS class
replies(3): >>43388447 #>>43388583 #>>43389181 #
151. jchw ◴[] No.43388398{8}[source]
Well, that's the thing though: in terms of Rust and Go and other safe programming languages, CPU exceptions are not "safe" even though they are not inherently dangerous. The point is that the subset of the language that is safe can't generate them, period. They are not accounted for in safe code.

There are uses for this, especially since some code will run in environments where you can not simply handle it, but it's also just cleaner this way; you don't have to worry about the different behaviors between operating systems and possibly CPU architectures with regards to error recovery if you simply don't generate any.

Since there are these edge cases where it wouldn't be possible to handle faults easily (e.g. some kernel code) it needs to be considered unsafe in general.

replies(1): >>43393003 #
152. Sharlin ◴[] No.43388447{5}[source]
Yeah, but Rust just proves the point here that (doubly) linked lists

a) are surprisingly nontrivial to get right,

b) have almost no practical uses, and

c) are only taught because they're conceptually nice and demonstrate pointers and O(1) vs O(n) tradeoffs.

Note that safe Rust has no problems with singly-linked lists or in general any directed tree structure.

153. jchw ◴[] No.43388463{8}[source]
Sure, but that is beside the point: if you compile code like that, you're intentionally making panics unrecoverable. The nature of panics from the language perspective is not any different; you're still in a well-defined state when it happens.

It's also possible to go a step further and practice "panic-free" Rust where you write code in such a way that it never links to the panic handler. Seems pretty hard to do, but seems like it might be worth it sometimes, especially if you're in an environment where you don't have anything sensible to do on a panic.

154. umanwizard ◴[] No.43388583{5}[source]
Why is it particularly funny?

C has to make a syscall to the kernel which ultimately results in a BIOS interrupt to implement printf, which you need for the hello world program on page 1 of K&R.

Does that mean that C has no abstraction advantage over directly coding interrupts with asm? Of course not.

replies(1): >>43389729 #
155. j-krieger ◴[] No.43388759{3}[source]
> With unsafe you get exactly the same kind of semantics as C, if you don't uphold the invariant the unsafe functions expect, you end up with UB exactly like in C.

This is not exactly true. Even in production code, unsafe preconditions check if you violate these rules.

Here: https://doc.rust-lang.org/core/macro.assert_unsafe_precondit... And here: https://google.github.io/comprehensive-rust/unsafe-rust/unsa...

replies(1): >>43390437 #
156. cmrdporcupine ◴[] No.43389051{8}[source]
The issue is that it's sitting in nightly for years. Many many many years.

I don't write software targetting nightly, for good reason.

157. tux3 ◴[] No.43389181{5}[source]
No, that's how the feature is supposed to work.

You design an abstraction which is unsafe inside, and exposes a safe API to users. That is really how unsafe it meant to be used.

Of course the standard library uses unsafe. This is where you want unsafe to be, not in random user code. That's what it was made for.

158. NobodyNada ◴[] No.43389387{12}[source]
Copypasting a comment into an LLM, and then copypasting its response back is not a useful contribution to a discussion, especially without even checking to be sure it got the answer right. If I wanted to know what an LLM had to say, I can go ask it myself; I'm on HN because I want to know what people have to say.
replies(1): >>43389546 #
159. sophacles ◴[] No.43389500{9}[source]
Most major software projects are not good, no matter what language.
160. oneshtein ◴[] No.43389527{13}[source]
An error value is valid output in both cases.
replies(1): >>43393545 #
161. ◴[] No.43389546{13}[source]
162. cesarb ◴[] No.43389729{6}[source]
> C has to make a syscall to the kernel which ultimately results in a BIOS interrupt to implement printf,

That's not the case since the late 1990s. Other than during early boot, nobody calls into the BIOS to output text, and even then "BIOS interrupt" is not something normally used anymore (EFI uses direct function calls through a function table instead of going through software interrupts).

What really happens in the kernel nowadays is direct memory access and direct manipulation of I/O ports and memory mapped registers. That is, all modern operating systems directly manipulate the hardware for text and graphics output, instead of going through the BIOS.

replies(1): >>43389918 #
163. steveklabnik ◴[] No.43389854{4}[source]
> How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?

The point is that you don't need to. The guarantees compose.

> The usual retort to these questions is 'well, the standard library uses unsafe code

It's not about the standard library, it's much more fundamental than that: hardware is not memory safe to access.

> If Rust did not have a mechanism to use external code then it would be fine

This is what GC'd languages with runtimes do. And even they almost always include FFI, which lets you call into arbitrary code via the C ABI, allowing for unsafe things. Rust is a language intended to be used at the bottom of the stack, and so has more first-class support, calling it "unsafe" instead of FFI.

164. cesarb ◴[] No.43389904{12}[source]
> There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.

That used to be the case for 32-bit platforms, but most 64-bit platforms in which GNU libc runs use the LP64 model, which has 32-bit int and 64-bit long. That documentation seems to be a bit outdated.

(One notable 64-bit platform which uses 32-bit for both int and long is Microsoft Windows, but that's not one of the target platforms for GNU libc.)

165. umanwizard ◴[] No.43389918{7}[source]
Thanks for the information (I mean that genuinely, not sarcastically — I do really find it interesting). But it doesn’t really impact my point.
166. hyperbrainer ◴[] No.43390029{5}[source]
If that is the case (and I have no reason to believe otherwise), I apologise. Should work on detecting tone better.
167. cesarb ◴[] No.43390196{4}[source]
> Is there such a boundary? How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?

Yes, there is a boundary, and usually it's either the function itself, or all methods of an object. For instance, a function I wrote recently goes somewhat like this:

  fn read_unaligned_u64_from_byte_slice(src: &[u8]) -> u64 {
    assert_eq!(src.len(), size_of::<u64>());
    unsafe { std::ptr::read_unaligned(src.as_ptr().cast::<u64>()) }
  }
The read_unaligned function (https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html) has two preconditions which have to be checked manually. When doing so, you'll notice that the "src" argument must have at least 8 bytes for these preconditions to be met; the "assert_eq!()" call before that unsafe block ensures that (it will safely panic unless the "src" slice has exactly 8 bytes). That is, my "read_unaligned_u64_from_byte_slice" function is safe, even though it calls unsafe code; the function is the boundary between safe and unsafe code. No callers of that function have to worry that it calls unsafe code in its implementation.
168. estebank ◴[] No.43390238{3}[source]
Note that that is a doubly linked list, because it is a "soup of ownership" data structure. A singly linked list has clear ownership so it can be modelled in safe Rust.

On modern aschitectures you shouldn't use either unless you have an extremely niche use-case. They are not general use data structures anymore in a world where cache locality is a thing.

169. vlovich123 ◴[] No.43390317{8}[source]
Not necessarily if you can hoist the bounds check outside of the loop somehow.
170. bangaladore ◴[] No.43390437{4}[source]
Quoted from your link

> Safe Rust: memory safe, no undefined behavior possible. Unsafe Rust: can trigger undefined behavior if preconditions are violated.

So Unsafe Rust from a UB perspective is no different than C/C++. If preconditions are violated, UB can occur, affecting anywhere in the program. Its unclear how the compiler could check anything about preconditions in a block explicitly used to say that the developer is the one upholding the preconditions.

replies(2): >>43392757 #>>43397883 #
171. kibwen ◴[] No.43390805{4}[source]
Using references in unsafe Rust is harder than using raw pointers in C.

Using raw pointers in unsafe Rust is easier than using raw pointers in C.

The solution is to not manipulate references in unsafe code. The problem is that in old versions of Rust this was tricky. Modern versions of Rust have addressed this by adding first-class facilities for producing pointers without needing temporary references: https://blog.rust-lang.org/2024/10/17/Rust-1.82.0.html#nativ...

172. miki123211 ◴[] No.43391048{3}[source]
This is far less of a problem than it would be in a C-like language, though.

You can implement that linked list just once, audit the unsafe parts extensively, provide a fully safe API to clients, and then just use that safe API in many different places. You don't need thousands of project-specific linked list reimplementations.

173. kazinator ◴[] No.43392018[source]
> clearly marked by the unsafe block.

Rust has macros; are macros prohibited from generating unsafe blocks, so that macro invocations don't have to be suspected of harboring unsafe code?

replies(1): >>43392644 #
174. uecker ◴[] No.43392234{7}[source]
There is definitely a distinction between safe and unsafe code in C, it is just not a simple binary distinction. But this does not make it impossible to screen C for unsafe constructions and it also does not mean that detecting unsafe issues in Rust is always trivial.
175. uecker ◴[] No.43392246{9}[source]
But this is also easy to protect against if you use the tools available to C programmers. It is part of the Rust hype that we would be completely helpless here, but this is far from the truth.
176. uecker ◴[] No.43392262{7}[source]
Rust is better at this yes, but the practical advantage is not necessarily that huge.
177. all2well ◴[] No.43392633{3}[source]
Doesn’t Arc and Weak work for doubly linked lists? Rust docs recommend Weak as a way to break pointer cycles: https://doc.rust-lang.org/std/sync/struct.Arc.html#breaking-...
178. steveklabnik ◴[] No.43392644[source]
No. Just like function bodies can contain unsafe blocks.
179. randomNumber7 ◴[] No.43392757{5}[source]
The rust compiler was written by chuck norris.
180. comex ◴[] No.43393003{9}[source]
That’s largely true, but there are some exceptions (pun not intended).

In Rust, the CPU exception resulting from a stack overflow is considered safe. The compiler uses stack probing to ensure that as long as there is at least one page of unmapped memory below the stack (guard page), the program will reliably fault on it rather than continuing to access memory further below. In most environments it is possible to set up a guard page, including Linux kernel code if CONFIG_VMAP_STACK is enabled. But there are other environments where it’s not, such as WebAssembly and some microcontrollers. In those environments, the backend would have to add explicit checks to function prologs to ensure enough stack is available. I say “would have to”, not “does”: I’ve heard that on at least the microcontrollers, there are no such checks and Rust is just unsound at the moment. Not sure about WebAssembly.

Meanwhile, Go uses CPU exceptions to handle nil dereferences.

replies(1): >>43393106 #
181. jchw ◴[] No.43393106{10}[source]
Yeah, I glossed over the Rust stack overflow case. I don't know why: Literally two parent comments up I did bother to mention it.

That said, I actually entirely forgot Go catches nil derefs in a segfault handler. I guess it's not a big deal since Go isn't really suitable for free-standing environments where avoiding CPU exceptions is sometimes more useful, so there's no particular reason why the runtime can't rely on it.

182. MaxBarraclough ◴[] No.43393545{14}[source]
The code is unarguably wrong.

average(INT_MAX,INTMAX) should return INT_MAX, but it will get that wrong and return -1.

average(0,-2) should not return a special error-code value, but this code will do just that, making -1 an ambiguous output value.

Even its comment is wrong. We can see from the signature of the function that there can be no value that indicates an error, as every possible value of int may be a legitimate output value.

It's possible to implement this function in a portable and standard way though, along the lines of [0].

[0] https://stackoverflow.com/a/61711253/ (Disclosure: this is my code.)

replies(1): >>43396843 #
183. umanwizard ◴[] No.43396071{11}[source]
Please stop posting AI-generated content to HN. It’s clear the majority of users hate it, given that it gets swiftly downvoted every time it’s posted.
184. umanwizard ◴[] No.43396082{12}[source]
I always downvote all AI-generated content regardless of whether it’s right or wrong, because I would like to discourage people from posting it.
185. umanwizard ◴[] No.43396097{4}[source]
Rust doesn’t have classes, nor can const values be modified, even in unsafe code. (did you mean “immutable”?)
186. umanwizard ◴[] No.43396112{4}[source]
The point of rust isn’t to formally prove that there are no bugs. It’s just to make writing certain classes of bugs harder. That’s what people are missing when they point out that yes, it’s possible to circumvent safety mechanisms. It’s a strawman: bulletproof, guaranteed security simply isn’t a design goal of rust.
187. MaxBarraclough ◴[] No.43396843{15}[source]
Too late for me to edit: as josefx pointed out, it also fails to properly address the undefined behavior. The sums INT_MAX + INT_MAX and INT_MIN + INT_MIN may still overflow despite being done using the long type.

That won't occur on an 'LP64' platform, [0] but we should aim for proper portability and conformance to the C language standard.

[0] https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_m...

188. j-krieger ◴[] No.43397883{5}[source]
> So Unsafe Rust from a UB perspective is no different than C/C++. If preconditions are violated, UB can occur

Only if you actively disable panics being triggered if unsafe preconditions are triggered. In most code, the program will crash instead. Enabling default panic on up violation in production code was done last year, IIRC.

> Its unclear how the compiler could check anything about preconditions

It can't. This is done at runtime, by default and without manually needed programmer interaction.

You can see an example of this in the `ptr`module, here: https://doc.rust-lang.org/beta/src/core/ptr/mod.rs.html#1071

Some are only enabled for `debug_assert` (which is enabled by default), see `ptr::read`, here: https://doc.rust-lang.org/beta/src/core/ptr/mod.rs.html#1370

replies(1): >>43402637 #
189. bangaladore ◴[] No.43402637{6}[source]
These seem to be beta features. But in any case it seems like its just doing some number of asserts to validate some preconditions.

However, even at runtime it can't do anything to say if (excuse the C pseudocode) *(uint32_t*)0x1C00 = 0xFE is a valid memory operations. On some systems, in some cases it might be.

replies(1): >>43410104 #
190. j-krieger ◴[] No.43410104{7}[source]
> These seem to be beta features

What? Where did you get that impression?

> But in any case it seems like its just doing some number of asserts to validate some preconditions

Yeah, like C code normally would, just in the STD in this case.

replies(1): >>43465052 #
191. saagarjha ◴[] No.43421298{9}[source]
That is what most people are talking about when they are discussing the JVM, yes
192. uecker ◴[] No.43445900{9}[source]
You can tell a C compiler to trap or wrap around on overflow, or you use checked arithmetic to test explicitly for overflow.
193. bangaladore ◴[] No.43465052{8}[source]
> What? Where did you get that impression?

https://doc.rust-lang.org/beta/

> Yeah, like C code normally would, just in the STD in this case.

Yes, in that manual checks are still needed. My point is unsafe code in rust is nowhere near safe and cannot be considered as safe without extensive analysis, no matter the language features used.