C++’s shared pointer has the same problem; Rust avoids it by having two types (Rc and Arc) that the developer can select from (and which the compiler will prevent you from using unsafely).
C++’s shared pointer has the same problem; Rust avoids it by having two types (Rc and Arc) that the developer can select from (and which the compiler will prevent you from using unsafely).
It doesn't. C++'s shared pointers use atomics, just like Rust's Arc does. There's no good reason (unless you have some very exotic requirements, into which I won't get into here) to implement shared pointers with mutexes. The implementation in the blog post here is just suboptimal.
(But it's true that C++ doesn't have Rust's equivalent of Rc, which means that if you just need a reference counted pointer then using std::shared_ptr is not a zero cost abstraction.)
Sure, cross-platform is desirable, if there's no cost involved, and mandatory if you actually need it, but it's a "nice to have" most of the time, not a "needs this".
As for mutex overheads, yep, that's annoying, but really, how annoying ? Modern CPUs are fast. Very very fast. Personally I'm far more likely to use an os_unfair_lock_t than a pthread_mutex_t (see the previous point) which minimizes the locking to a memory barrier, but even if locking were slow, I think I'd prefer safe.
Rust is, I'm sure, great. It's not something I'm personally interested in getting involved with, but it's not necessary for C (or even this extra header) to do everything that Rust can do, for it to be an improvement on what is available.
There's simply too much out there written in C to say "just use Rust, or Swift, or ..." - too many libraries, too many resources, too many tutorials, etc. You pays your money and takes your choice.
I'd be interested to know what you are thinking.
The primary exotic thing I can imagine is an architecture lacking the ability to do atomic operations. But even in that case, C11 has atomic operations [1] built in. So worst case, the C library for the target architecture would likely boil down to mutex operations.
But I suppose we're wasting time on useless nitpicking. So, fair enough.
Edit: in other words C++ could provide an equivalent of Rc, but we’d see no end of people complaining when they shoot themselves in the foot with it.
(This is what “zero cost abstraction” means: it doesn’t mean no cost, just that the abstraction’s cost is no greater than the semantically equivalent version written by the user. So both Arc and shared_ptr are zero-cost in a MT setting, but only Rust has a zero-cost abstraction in a single-threaded setting.)
> We love its raw speed, its direct connection to the metal
If this is a strong motivating factor (versus, say, refactoring risk), then C’s lack of safe zero-cost abstractions is a valid concern.
Simply put, just as a `unique_ptr` (`Box`) is an entirely different abstraction than `shared_ptr` (`Arc`), an `Rc` is also an entirely different abstraction than `Arc`, and C++ simply happens to completely lack `Rc` (at least in the standard; Boost of course has one). But if it had one you could use it with exactly the same cost as in Rust, you'd just have to manually make sure to not use it across threads (which indeed is easier said than done, which is why it's not in the standard), exactly the same as if you'd manually maintain the reference count without the nice(er) abstraction. Hence "zero cost abstraction".
// Not real MIPS, just what I've gleaned from a brief look at some docs
LOAD addr, register
ADD 1, register
STORE register, addr
The LOAD and STORE are atomic, but the `ADD` happens out of band.That's a problem if any sort of interrupt happens (if you are multi-threading then a possibility). If it happens at the load, then a separate thread can update "addr" which mean the later STORE will stomp on what's there.
x86 and ARM can do
ADD 1, addr
as well as other instructions like "compare and swap" LOAD addr, register
MOV register, register2
ADD 1, register2
COMPARE_AND_SWAP addr, register, register2
if (cas_failed) { try again }It's an implementation detail. They could have used atomic load/store (since c11) to implement the increment/decrement.
TBH I'm not sure what a mutex buys you in this situation (reference counting)
For this use-case, you might not notice. ISTR, when examing the pthreads source code for some platform, that mutexes only do a context switch as a fallback, if the lock cannot be acquired.
So, for most use-cases of this header, you should not see any performance impact. You'll see some bloat, to be sure.
You can't create userspace locks which is a bummer, but the OS has the capability of enforcing locks. That's basically how early locking worked.
The main thing needed to make a correct lock is interrupt protection. Something every OS has.
To go fast, you need atomic operations. It especially becomes important if you are dealing with multiple cores. However, for a single core system atomics aren't needed for the OS to create locks.
I'd much rather it didnt try to be zero-cost and it always used atomics...
loop: LL r2, (r1)
ADD r3, r2, 1
SC r3, (r1)
BEQ r3, 0, loop
NOPYes. Also, almost every platform I know that supports multi threading and atomics doesn’t support atomics between /all/ possible masters. Consider a microcontroller with, say, two Arm cores (multithreaded, atomic-supporting) and a DMA engine.
(for reference, the person above is referring to what's described here: https://snf.github.io/2019/02/13/shared-ptr-optimization/)
The "language" is conventionally thought of as the sum of the effects given by the { compiler + runtime libraries }. The "language" often specifies features that are implemented exclusively in target libraries, for example. You're correct to say that they're not "language features" but the two domains share a single label like "C++20" / "C11" - so unless you're designing the toolchain it's not as significant a difference.
We're down to ~three compilers: gcc, clang, MSVC and three corresponding C++ libraries.
Nit: while it's possible to implement one with just atomic reads and writes, it's generally not trivial/efficient/ergonomic to do so without an atomic composite read-write operation, like a compare-and-swap.
This seems workload dependent; I would expect a lot of workloads to be write-heavy or at least mixed, since copies imply writes to the shared_ptr's control block.
// The old way of manual reference counting
typedef struct {
MatchStore* store;
int ref_count;
pthread_mutex_t mutex;
} SharedStore;There really isn't. Speaking as someone who works in JVM-land, you really can avoid C all the time if you're willing to actually try.
It does. It’s called a process.
Everyone chose convenience and micro-benchmarks by choosing threads instead.
In any case, you could use the provided primitives to wrap the C11 mutex, or any other mutex.
With some clever #ifdef, you can probably have a single or multithreaded build switch at compile time which makes all the mutex stuff do nothing.
BTW don’t fight C for portability, it is unlikely you will win.
Plus, a lot of what I do is on microcontrollers with tens of kilobytes of RAM, not big-iron massively parallel servers where Java is commonly used. The vendor platform libraries are universally provided in C, so unless you want to reimplement the SPI or USB handler code, and probably write the darn rust implementation/Java virtual machine, and somehow squeeze it all in, then no, you can’t really avoid C.
Or assembler for that matter, interrupt routines often need assembly language to get latency down, and memory management (use this RAM address range because it’s “TCM” 1-clock latency, otherwise it’s 5 or 6 clocks and everything breaks…)