Shared_ptr<T>: the (not always) atomic reference counted smart pointer (2019)

(snf.github.io)

51 points klaussilveira | 4 comments | 27 Aug 25 14:34 UTC | HN request time: 0.536s | source

Show context

sesuximo ◴[31 Aug 25 12:36 UTC] No.45082719[source]▶

>>45040290 (OP) #

Why is the atomic version slower? Is it slower on modern x86?

replies(2): >>45082778 #>>45083133 #

eptcyka ◴[31 Aug 25 12:50 UTC] No.45082778[source]▶

>>45082719 #

Atomic write operations force a cache line flush and can wait until the memory is updated. Atomic reads have to be read from memory or a shared cache. Atomics are slow because memory is slow.

replies(3): >>45082805 #>>45083625 #>>45085604 #

Krssst ◴[31 Aug 25 12:57 UTC] No.45082805[source]▶

>>45082778 #

I don't think an atomic operation necessarily demands a cache flush. L1 cache lines can move across cores as needed in my understanding (maybe not on multi-socket machines?). Barriers are required if further memory ordering guarantees are needed.

replies(1): >>45082876 #

ot ◴[31 Aug 25 13:07 UTC] No.45082876[source]▶

>>45082805 #

Not a L1/L2/... cache flush, but a store buffer flush, at least on x86. This is true for LOCK instructions. Loads/stores (again on x86) are always acquire/release, so they don't need additional fences if you don't need seq-cst. However, seq-cst atomics in C++ lower stores to LOCK XCHG, so you get a fence.

replies(1): >>45083031 #

tialaramex ◴[31 Aug 25 13:30 UTC] No.45083031[source]▶

>>45082876 #

There is no way the shared_ptr<T> is using the expensive sequentially consistent atomic operations.

Even if you're one of the crazy people who thinks that's the sane default, the value from analysing and choosing a better ordering rule for this key type is enormous and when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

replies(3): >>45083183 #>>45084954 #>>45097016 #

1. loeg ◴[31 Aug 25 13:57 UTC] No.45083183[source]▶

>>45083031 #

> when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

Why would shared_ptr refcounting need anything other than relaxed? Acq/rel are for implementing multi-variable atomic protocols, and shared_ptr refcounting simply doesn't have other variables.

replies(3): >>45083552 #>>45083875 #>>45084525 #

2. tialaramex ◴[31 Aug 25 14:47 UTC] No.45083552[source]▶

>>45083183 (TP) #

It's extremely difficult to see in real C++ standard library source because of the layers of obfuscating compiler workaround hacks, but eventually they are in fact using acquire-release ordering, but only for decrementing the reference count. Does that help you figure out why we want acquire-release, or do you need more help ?

3. dataflow ◴[31 Aug 25 15:22 UTC] No.45083875[source]▶

>>45083183 (TP) #

It's because you're not solely managing the refcount here. Other memory locations have a dependence on the refcount, given that you're also deleting the object after the refcount reaches zero. That means you need all writes to have completed at that point, and all reads to observe that. Otherwise you might destroy an object while it's in an invalid state, or you might release the memory while another thread is accessing it.

4. Kranar ◴[31 Aug 25 16:38 UTC] No.45084525[source]▶

>>45083183 (TP) #

You need it to avoid a use after free.

↑