←back to thread

51 points klaussilveira | 1 comments | | HN request time: 0s | source
Show context
sesuximo ◴[] No.45082719[source]
Why is the atomic version slower? Is it slower on modern x86?
replies(2): >>45082778 #>>45083133 #
eptcyka ◴[] No.45082778[source]
Atomic write operations force a cache line flush and can wait until the memory is updated. Atomic reads have to be read from memory or a shared cache. Atomics are slow because memory is slow.
replies(3): >>45082805 #>>45083625 #>>45085604 #
Krssst ◴[] No.45082805[source]
I don't think an atomic operation necessarily demands a cache flush. L1 cache lines can move across cores as needed in my understanding (maybe not on multi-socket machines?). Barriers are required if further memory ordering guarantees are needed.
replies(1): >>45082876 #
ot ◴[] No.45082876[source]
Not a L1/L2/... cache flush, but a store buffer flush, at least on x86. This is true for LOCK instructions. Loads/stores (again on x86) are always acquire/release, so they don't need additional fences if you don't need seq-cst. However, seq-cst atomics in C++ lower stores to LOCK XCHG, so you get a fence.
replies(1): >>45083031 #
tialaramex ◴[] No.45083031[source]
There is no way the shared_ptr<T> is using the expensive sequentially consistent atomic operations.

Even if you're one of the crazy people who thinks that's the sane default, the value from analysing and choosing a better ordering rule for this key type is enormous and when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

replies(3): >>45083183 #>>45084954 #>>45097016 #
loeg ◴[] No.45083183[source]
> when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

Why would shared_ptr refcounting need anything other than relaxed? Acq/rel are for implementing multi-variable atomic protocols, and shared_ptr refcounting simply doesn't have other variables.

replies(3): >>45083552 #>>45083875 #>>45084525 #
1. dataflow ◴[] No.45083875[source]
It's because you're not solely managing the refcount here. Other memory locations have a dependence on the refcount, given that you're also deleting the object after the refcount reaches zero. That means you need all writes to have completed at that point, and all reads to observe that. Otherwise you might destroy an object while it's in an invalid state, or you might release the memory while another thread is accessing it.