←back to thread

51 points klaussilveira | 1 comments | | HN request time: 0s | source
Show context
sesuximo ◴[] No.45082719[source]
Why is the atomic version slower? Is it slower on modern x86?
replies(2): >>45082778 #>>45083133 #
eptcyka ◴[] No.45082778[source]
Atomic write operations force a cache line flush and can wait until the memory is updated. Atomic reads have to be read from memory or a shared cache. Atomics are slow because memory is slow.
replies(3): >>45082805 #>>45083625 #>>45085604 #
Krssst ◴[] No.45082805[source]
I don't think an atomic operation necessarily demands a cache flush. L1 cache lines can move across cores as needed in my understanding (maybe not on multi-socket machines?). Barriers are required if further memory ordering guarantees are needed.
replies(1): >>45082876 #
ot ◴[] No.45082876[source]
Not a L1/L2/... cache flush, but a store buffer flush, at least on x86. This is true for LOCK instructions. Loads/stores (again on x86) are always acquire/release, so they don't need additional fences if you don't need seq-cst. However, seq-cst atomics in C++ lower stores to LOCK XCHG, so you get a fence.
replies(1): >>45083031 #
tialaramex ◴[] No.45083031[source]
There is no way the shared_ptr<T> is using the expensive sequentially consistent atomic operations.

Even if you're one of the crazy people who thinks that's the sane default, the value from analysing and choosing a better ordering rule for this key type is enormous and when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

replies(3): >>45083183 #>>45084954 #>>45097016 #
loeg ◴[] No.45083183[source]
> when you do that analysis your answer is going to be acquire-release and only for some edge cases, in many places the relaxed atomic ordering is fine.

Why would shared_ptr refcounting need anything other than relaxed? Acq/rel are for implementing multi-variable atomic protocols, and shared_ptr refcounting simply doesn't have other variables.

replies(3): >>45083552 #>>45083875 #>>45084525 #
1. tialaramex ◴[] No.45083552[source]
It's extremely difficult to see in real C++ standard library source because of the layers of obfuscating compiler workaround hacks, but eventually they are in fact using acquire-release ordering, but only for decrementing the reference count. Does that help you figure out why we want acquire-release, or do you need more help ?