Shared_ptr<T>: the (not always) atomic reference counted smart pointer (2019)

(snf.github.io)

51 points klaussilveira | 1 comments | 27 Aug 25 14:34 UTC | HN request time: 0s | source

Show context

sesuximo ◴[31 Aug 25 12:36 UTC] No.45082719[source]▶

>>45040290 (OP) #

Why is the atomic version slower? Is it slower on modern x86?

replies(2): >>45082778 #>>45083133 #

eptcyka ◴[31 Aug 25 12:50 UTC] No.45082778[source]▶

>>45082719 #

Atomic write operations force a cache line flush and can wait until the memory is updated. Atomic reads have to be read from memory or a shared cache. Atomics are slow because memory is slow.

replies(3): >>45082805 #>>45083625 #>>45085604 #

Krssst ◴[31 Aug 25 12:57 UTC] No.45082805[source]▶

>>45082778 #

I don't think an atomic operation necessarily demands a cache flush. L1 cache lines can move across cores as needed in my understanding (maybe not on multi-socket machines?). Barriers are required if further memory ordering guarantees are needed.

replies(1): >>45082876 #

ot ◴[31 Aug 25 13:07 UTC] No.45082876[source]▶

>>45082805 #

Not a L1/L2/... cache flush, but a store buffer flush, at least on x86. This is true for LOCK instructions. Loads/stores (again on x86) are always acquire/release, so they don't need additional fences if you don't need seq-cst. However, seq-cst atomics in C++ lower stores to LOCK XCHG, so you get a fence.

replies(1): >>45083031 #

tialaramex ◴[31 Aug 25 13:30 UTC] No.45083031[source]▶

>>45082876 #

There is no way the shared_ptr<T> is using the expensive sequentially consistent atomic operations.

Even if you're one of the crazy people who thinks that's the sane default, the value from analysing and choosing a better ordering rule for this key type is enormous and when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

replies(3): >>45083183 #>>45084954 #>>45097016 #

loeg ◴[31 Aug 25 13:57 UTC] No.45083183[source]▶

>>45083031 #

> when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

Why would shared_ptr refcounting need anything other than relaxed? Acq/rel are for implementing multi-variable atomic protocols, and shared_ptr refcounting simply doesn't have other variables.

replies(3): >>45083552 #>>45083875 #>>45084525 #

1. tialaramex ◴[31 Aug 25 14:47 UTC] No.45083552[source]▶

>>45083183 #

It's extremely difficult to see in real C++ standard library source because of the layers of obfuscating compiler workaround hacks, but eventually they are in fact using acquire-release ordering, but only for decrementing the reference count. Does that help you figure out why we want acquire-release, or do you need more help ?

↑